Metadata-Version: 2.1
Name: wowool-semantic-themes
Version: 3.2.0
Summary: Wowool Semantic Themes
Author: Wowool
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: wowool-sdk

# Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

<note>Themes are pre-defined categories in which you want to categorize your documents. While [topics](https://www.wowool.com/docs/apps/topics) are extracted salient noun groups from the processed documents.</note>

## Prerequisites

The `themes.app` uses the `Theme` entity and the annotation attributes `theme` and `sector` to collect information to identify potential categories. To enable this functionality, ensure that the `semantic-themes` domain is included in your processing pipeline or that you have a custom domain that produces `Theme` entities.

## Options

#### ThemesOptions

```typescript
interface ThemesOptions {
    collect?: Record<str, UriInfo>;
    attributes?: string[];
    count?: number;
    threshold?: number;
}
```

with:

| Property     | Description                                                                                              |
|--------------|----------------------------------------------------------------------------------------------------------|
| `collect`    | Specifies which entities and which information that will be considered during the categorization process |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']`                       |
| `count`      | Maximum number of themes to collect                                                                      |
| `threshold`  | Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme           |

#### UriInfo

`UriInfo` is used in case you want the app to include or exclude some URI's to be used during the categorization process.


```typescript
interface UriInfo {
    uri: boolean;
    attributes: string[];
}
```

with:

| Property     | Description                                                                        |
|--------------|------------------------------------------------------------------------------------|
| `uri`        | Specifies whether the entity canonical should be considered a theme candidate     |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']` |

## Results

#### ThemesResults

```typescript
type ThemesResults = ThemesResult[];
```

#### ThemesResult

```typescript
interface ThemesResult {
    name: string;
    relevancy: number;
}
```

with:

| Property    | Description                                |
|-------------|--------------------------------------------|
| `name`      | Name of the theme                          |
| `relevancy` | Relevancy of the theme within the document |

## Examples

### Persons and events

Let's say you want to consider the `position` attribute from `Person`, the name of the `Country` and the global attributes for `Event`. Then the following configuration can be used:

<sample data-uuid="themes_example"></sample>

which yields as output:

```json
[
  { "name": "ceo", "relevancy": 100 },
  { "name": "terrorism", "relevancy": 50 },
  { "name": "business", "relevancy": 50 },
  { "name": "aerospace", "relevancy": 50 },
  { "name": "USA", "relevancy": 50  }
]
```

# Categorizing your documents

The themes app identifies the most relevant themes (categories) by collecting entities with the attribute theme. It then determines the most appropriate categories for the document.

<note>Themes are pre-defined categories in which you want to categorize your documents. While [topics](https://www.wowool.com/docs/apps/topics) are extracted salient noun groups from the processed documents.</note>

## Prerequisites

The `themes.app` uses the `Theme` entity and the annotation attributes `theme` and `sector` to collect information to identify potential categories. To enable this functionality, ensure that the `semantic-themes` domain is included in your processing pipeline or that you have a custom domain that produces `Theme` entities.

## Options

#### ThemesOptions

```typescript
interface ThemesOptions {
    collect?: Record<str, UriInfo>;
    attributes?: string[];
    count?: number;
    threshold?: number;
}
```

with:

| Property     | Description                                                                                              |
|--------------|----------------------------------------------------------------------------------------------------------|
| `collect`    | Specifies which entities and which information that will be considered during the categorization process |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']`                       |
| `count`      | Maximum number of themes to collect                                                                      |
| `threshold`  | Minimal probability, expressed as a percentage, for a theme candidate to be considered a theme           |

#### UriInfo

`UriInfo` is used in case you want the app to include or exclude some URI's to be used during the categorization process.


```typescript
interface UriInfo {
    uri: boolean;
    attributes: string[];
}
```

with:

| Property     | Description                                                                        |
|--------------|------------------------------------------------------------------------------------|
| `uri`        | Specifies whether the entity canonical should be considered a theme candidate     |
| `attributes` | Attributes that are considered as theme candidates. Default: `['theme', 'sector']` |

## Results

#### ThemesResults

```typescript
type ThemesResults = ThemesResult[];
```

#### ThemesResult

```typescript
interface ThemesResult {
    name: string;
    relevancy: number;
}
```

with:

| Property    | Description                                |
|-------------|--------------------------------------------|
| `name`      | Name of the theme                          |
| `relevancy` | Relevancy of the theme within the document |

# API

## Examples

### Using the Semantic Themes

This sample demonstrates how to use the Themes app from the wowool.semantic_themes package to extract and rank semantic themes from a document.

* PipeLine is used to create a text analysis pipeline.
* Themes is the app that extracts and ranks themes.

```python
from wowool.sdk import Pipeline
from wowool.semantic_themes import Themes

analysis = Pipeline("english,entity,semantic-theme")
themes = Themes(count=2)
# run the document analysis
document = themes(analysis("EyeOnID works on cybercrime prevention"))
for item in document.results("wowool_themes"):
    print(f" - {item['name']}: {item['relevancy']}")

```



## License

In both cases you will need to acquirer a license file at https://www.wowool.com

### Non-Commercial

    This library is licensed under the GNU AGPLv3 for non-commercial use.  
    For commercial use, a separate license must be purchased.  

### Commercial license Terms

    1. Grants the right to use this library in proprietary software.  
    2. Requires a valid license key  
    3. Redistribution in SaaS requires a commercial license.  
