Metadata-Version: 2.4
Name: knowledgesdk
Version: 0.2.0
Summary: KnowledgeSDK Python SDK — Extract, classify and search web knowledge
Project-URL: Homepage, https://knowledgesdk.com
Project-URL: Repository, https://github.com/knowledgesdk/knowledgesdk-python
Project-URL: Issues, https://github.com/knowledgesdk/knowledgesdk-python/issues
Author: KnowledgeSDK
License-Expression: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.8
Requires-Dist: pydantic>=2.0.0
Requires-Dist: requests>=2.31.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: flake8>=6.0.0; extra == 'dev'
Requires-Dist: isort>=5.0.0; extra == 'dev'
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# KnowledgeSDK Python SDK

Official Python client for the [KnowledgeSDK](https://knowledgesdk.com) API — extract, classify, scrape, screenshot, and search web knowledge programmatically.

## Installation

```bash
pip install knowledgesdk
```

## Quick Start

```python
from knowledgesdk import KnowledgeSDK

ks = KnowledgeSDK("sk_ks_your_key_here")
```

## Usage

### Extract

Run a full knowledge extraction on a website (synchronous):

```python
result = ks.extract.run("https://stripe.com")

print(result.business.business_name)
print(result.business.industry_sector)
print(result.pages_scraped)

for item in result.knowledge_items:
    print(item.title, item.content)
```

Run an asynchronous extraction with a callback:

```python
job = ks.extract.run_async(
    "https://stripe.com",
    max_pages=20,
    callback_url="https://myapp.com/webhook"
)

print(job.job_id)   # e.g. "job_abc123"
print(job.status)   # e.g. "PENDING"
```

### Scrape

Scrape a single web page and get its Markdown content:

```python
page = ks.scrape.run("https://docs.stripe.com/get-started")

print(page.title)
print(page.markdown)
print(page.links)
```

### Classify

Classify a business from its website:

```python
biz = ks.classify.run("https://stripe.com")

print(biz.business_name)
print(biz.business_type)
print(biz.industry_sector)
print(biz.target_audience)
print(biz.confidence_score)
```

### Screenshot

Capture a screenshot of a web page:

```python
shot = ks.screenshot.run("https://stripe.com")

# shot.screenshot is a base64-encoded PNG string
import base64
image_bytes = base64.b64decode(shot.screenshot)
with open("screenshot.png", "wb") as f:
    f.write(image_bytes)
```

### Sitemap

Fetch the sitemap for a website:

```python
site_map = ks.sitemap.run("https://stripe.com")

print(site_map.count)
for url in site_map.urls:
    print(url)
```

### Search

Search the extracted knowledge base:

```python
results = ks.search.run("pricing plans", limit=5)

print(f"Found {results.total} results")
for hit in results.hits:
    print(hit.title, hit.score)
    print(hit.content)
```

### Webhooks

```python
# Create a webhook
wh = ks.webhooks.create(
    url="https://myapp.com/hook",
    events=["EXTRACTION_COMPLETED", "JOB_FAILED"],
    display_name="My App Webhook"
)
print(wh.id)    # e.g. "weh_xxx"
print(wh.token) # signing token

# List all webhooks
all_webhooks = ks.webhooks.list()
for w in all_webhooks:
    print(w.id, w.url, w.status)

# Send a test event to a webhook
ks.webhooks.test("weh_xxx")

# Delete a webhook
ks.webhooks.delete("weh_xxx")
```

### Jobs

Retrieve a job by ID:

```python
job = ks.jobs.get("job_xxx")
print(job.status)   # PENDING | RUNNING | COMPLETED | FAILED
print(job.progress) # 0–100
print(job.result)
```

Poll until a job completes (blocking):

```python
completed = ks.jobs.poll("job_xxx", interval_sec=5, timeout_sec=300)
print(completed.result)
```

## Configuration

| Parameter | Default | Description |
|---|---|---|
| `api_key` | required | API key starting with `sk_ks_` |
| `base_url` | `https://api.knowledgesdk.com` | Override via `KNOWLEDGESDK_BASE_URL` env var |
| `timeout` | `30000` | Request timeout in milliseconds |
| `max_retries` | `5` | Max retries with exponential backoff |
| `debug` | `False` | Enable request/response logging |

### Environment Variables

```bash
export KNOWLEDGESDK_BASE_URL="https://api.knowledgesdk.com"
```

### Debug Mode

```python
ks = KnowledgeSDK("sk_ks_your_key", debug=True)

# Or toggle at runtime
ks.set_debug_mode(True)
```

### Custom Headers

```python
ks.set_header("X-Custom-Header", "value")
ks.set_headers({"X-Header-A": "a", "X-Header-B": "b"})
```

## Error Handling

```python
from knowledgesdk import (
    KnowledgeSDK,
    AuthenticationError,
    APIError,
    RateLimitError,
    NetworkError,
    TimeoutError,
)

ks = KnowledgeSDK("sk_ks_your_key")

try:
    result = ks.extract.run("https://stripe.com")
except AuthenticationError as e:
    print(f"Auth error: {e.message}")
except RateLimitError as e:
    print(f"Rate limited: {e.message}")
except APIError as e:
    print(f"API error {e.status_code}: {e.message}")
except NetworkError as e:
    print(f"Network error: {e.message}")
except TimeoutError as e:
    print(f"Request timed out: {e.message}")
```

## Type Reference

All response objects are Pydantic models and are fully typed.

| Type | Description |
|---|---|
| `ExtractResult` | Full extraction with business and knowledge items |
| `BusinessClassification` | Business name, type, industry, audience, etc. |
| `KnowledgeItem` | A single knowledge article extracted from a page |
| `ScrapeResult` | Markdown content, title, description, links |
| `ScreenshotResult` | Base64 PNG screenshot |
| `SitemapResult` | List of URLs from the site's sitemap |
| `SearchResult` | Search hits, total count, query |
| `SearchHit` | Individual search result with score |
| `AsyncJobRef` | Job ID and initial status for async operations |
| `JobResult` | Full job status, progress, result, and error |
| `WebhookFull` | Webhook ID, URL, events, status, token |

## Requirements

- Python >= 3.8
- `requests >= 2.31.0`
- `pydantic >= 2.0.0`

## License

MIT
