Metadata-Version: 2.4
Name: wowool-semantic-themes
Version: 3.3.0
Summary: Wowool Semantic Themes
Author: Wowool
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: nlp-wowool-sdk
Dynamic: author
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

<note>Themes are pre-defined categories in which you want to categorize your documents. While [topics](https://www.wowool.com/docs/apps/topics) are extracted salient noun groups from the processed documents.</note>

## Prerequisites

The `themes.app` uses the `Theme` entity and the annotation attributes `theme` and `sector` to collect information to identify potential categories. To enable this functionality, ensure that the `semantic-themes` domain is included in your processing pipeline or that you have a custom domain that produces `Theme` entities.

## Options

#### ThemesOptions

```typescript
interface ThemesOptions {
  collect?: Record<str, UriInfo>;
  attributes?: string[];
  count?: number;
  threshold?: number;
}
```

with:

| Property     | Description                                                                                              |
| ------------ | -------------------------------------------------------------------------------------------------------- |
| `collect`    | Specifies which entities and which information that will be considered during the categorization process |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']`                       |
| `count`      | Maximum number of themes to collect                                                                      |
| `threshold`  | Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme           |

#### UriInfo

`UriInfo` is used in case you want the app to include or exclude some URI's to be used during the categorization process.

```typescript
interface UriInfo {
  uri: boolean;
  attributes: string[];
}
```

with:

| Property     | Description                                                                        |
| ------------ | ---------------------------------------------------------------------------------- |
| `uri`        | Specifies whether the entity canonical should be considered a theme candidate      |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']` |

## Results

#### ThemesResults

```typescript
type ThemesResults = ThemesResult[];
```

#### ThemesResult

```typescript
interface ThemesResult {
  name: string;
  relevancy: number;
}
```

with:

| Property    | Description                                |
| ----------- | ------------------------------------------ |
| `name`      | Name of the theme                          |
| `relevancy` | Relevancy of the theme within the document |

## Examples

### Persons and events

Let's say you want to consider the `position` attribute from `Person`, the name of the `Country` and the global attributes for `Event`. Then the following configuration can be used:

<sample data-uuid="themes_example"></sample>

which yields as output:

```json
[
  { "name": "ceo", "relevancy": 100 },
  { "name": "terrorism", "relevancy": 50 },
  { "name": "business", "relevancy": 50 },
  { "name": "aerospace", "relevancy": 50 },
  { "name": "USA", "relevancy": 50 }
]
```
# Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

<note>Themes are pre-defined categories in which you want to categorize your documents. While [topics](https://www.wowool.com/docs/apps/topics) are extracted salient noun groups from the processed documents.</note>

## Prerequisites

The `themes.app` uses the `Theme` entity and the annotation attributes `theme` and `sector` to collect information to identify potential categories. To enable this functionality, ensure that the `semantic-themes` domain is included in your processing pipeline or that you have a custom domain that produces `Theme` entities.

## Options

#### ThemesOptions

```typescript
interface ThemesOptions {
  collect?: Record<str, UriInfo>;
  attributes?: string[];
  count?: number;
  threshold?: number;
}
```

with:

| Property     | Description                                                                                              |
| ------------ | -------------------------------------------------------------------------------------------------------- |
| `collect`    | Specifies which entities and which information that will be considered during the categorization process |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']`                       |
| `count`      | Maximum number of themes to collect                                                                      |
| `threshold`  | Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme           |

#### UriInfo

`UriInfo` is used in case you want the app to include or exclude some URI's to be used during the categorization process.

```typescript
interface UriInfo {
  uri: boolean;
  attributes: string[];
}
```

with:

| Property     | Description                                                                        |
| ------------ | ---------------------------------------------------------------------------------- |
| `uri`        | Specifies whether the entity canonical should be considered a theme candidate      |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']` |

## Results

#### ThemesResults

```typescript
type ThemesResults = ThemesResult[];
```

#### ThemesResult

```typescript
interface ThemesResult {
  name: string;
  relevancy: number;
}
```

with:

| Property    | Description                                |
| ----------- | ------------------------------------------ |
| `name`      | Name of the theme                          |
| `relevancy` | Relevancy of the theme within the document |

# API

## Examples

### Using the Semantic Themes

This sample demonstrates how to use the Themes app from the wowool.semantic_themes package to extract and rank semantic themes from a document.

* PipeLine is used to create a text analysis pipeline.
* Themes is the app that extracts and ranks themes.

```python
from wowool.sdk import Pipeline
from wowool.semantic_themes import Themes

analysis = Pipeline("english,entity,semantic-theme")
themes = Themes(count=2)
# run the document analysis
document = themes(analysis("EyeOnID works on cybercrime prevention"))
for item in document.themes:
    print(f" - {item.name}: {item.relevancy}")

```



## License

In both cases you will need to acquirer a license file at https://www.wowool.com

### Non-Commercial

    This library is licensed under the GNU AGPLv3 for non-commercial use.  
    For commercial use, a separate license must be purchased.  

### Commercial license Terms

    1. Grants the right to use this library in proprietary software.  
    2. Requires a valid license key  
    3. Redistribution in SaaS requires a commercial license.  
