Metadata-Version: 2.4
Name: spiderkit
Version: 0.1.0
Summary: 一个功能强大的 Python 爬虫工具包，提供加密解密、数据存储、异步下载和字体解析等功能
Author-email: Dawn <3439972272@qq.com>
License: MIT
Keywords: async,crawler,download,encryption,font-parser,spider
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Security :: Cryptography
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: aiofiles>=23.0.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: ddddocr>=1.4.0
Requires-Dist: fonttools>=4.40.0
Requires-Dist: loguru>=0.7.0
Requires-Dist: m3u8>=3.5.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: pycryptodome>=3.18.0
Requires-Dist: tqdm>=4.65.0
Description-Content-Type: text/markdown

# SpiderKit

一个功能强大的 Python 爬虫工具包，提供加密解密、数据存储、异步下载和字体解析等功能

## 特性

- **加密解密模块**: 支持 RSA、AES、DES、3DES 等多种加密算法
- **数据存储模块**: 支持 CSV、JSON、JSONL 格式的数据保存
- **异步下载器**: 高性能异步文件下载，支持 M3U8 视频下载
- **字体解析模块**: 自动解析反爬虫字体文件，生成字符映射
- **通用工具模块**: 提供常用的哈希函数和工具方法

## 安装

```bash
pip install spiderkit
```

## 快速开始

### 加密解密

```python
import os

from spiderkit.crypto import generate_rsa_keypair
from spiderkit.crypto import rsa_encrypt, rsa_decrypt, aes_encrypt, aes_decrypt

plaintext = "Hello Dawn!"

# RSA 加密解密
public_key, private_key = generate_rsa_keypair()
rsa_encrypted = rsa_encrypt(plaintext, public_key, "OAEP")
print(rsa_encrypted)
rsa_decrypted = rsa_decrypt(rsa_encrypted, private_key, "OAEP")
print(rsa_decrypted)

# AES 加密解密
aes_key = os.urandom(32)
aes_iv = os.urandom(16)
aes_encrypted = aes_encrypt(plaintext, aes_key, "CBC", iv=aes_iv)
print(aes_encrypted)
aes_decrypted = aes_decrypt(aes_encrypted, aes_key, "CBC", iv=aes_iv)
print(aes_decrypted)
```

### 异步下载

```python
import asyncio
from spiderkit.downloader import Downloader, M3U8Downloader

# 可选请求头（部分网站加了防盗链需要 Referer 字段）
headers = {
    "Referer": "https://www.example.com/",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.98 Safari/537.36"
}

# 普通文件下载
downloader = Downloader(headers=headers)
file_mapping = {
    "images/image1.jpg": "https://example.com/image1.jpg",
    "images/image2.jpg": "https://example.com/image2.jpg"
}
downloader.download_files(file_mapping)

# M3U8 视频下载
m3u8_downloader = M3U8Downloader(headers=headers)
m3u8_downloader.download_video("https://example.com/video.m3u8", "output_video.mp4")
```

### 字体解析

```python
from spiderkit.font import parse_font_url, decrypt_text_with_font_maps

# 解析字体文件路径或URL
# font_maps = parse_font_url("fonts/font.woff")
font_maps = parse_font_url("https://example.com/font.woff")

# 解密文本
encrypted_text = "加密的文本"
decrypted_text = decrypt_text_with_font_maps(encrypted_text, font_maps)
print(decrypted_text)
```

### 哈希计算

```python
from spiderkit.utils.hash_utils import md5, sha1, sha256, sha512, sha3_256, blake2b

text = "Hello Dawn!"

# 默认输出 hex
print(md5(text))
print(sha1(text))
print(sha256(text))
print(sha512(text))

# 其他算法
print(sha3_256(text))
print(blake2b(text))

# 其他输出格式: binary / base64
print(md5(text, "binary"))
print(md5(text, "base64"))
```

### 数据存储

```python
from spiderkit.storage import save_data_to_file

data = [
    {"name": "张三", "age": 25},
    {"name": "李四", "age": 30}
]

# 保存为 CSV
save_data_to_file(data, "users", "csv")

# 保存为 JSON
save_data_to_file(data, "users", "json")

# 保存为 JSONL
save_data_to_file(data, "users", "jsonl")
```