Metadata-Version: 2.4
Name: tos-vectors-embed-cli
Version: 0.3.0
Summary: Standalone CLI for TOS Vector operations with Volcengine Ark embeddings
Home-page: https://github.com/volcengine/tos-vectors-embed-cli
Author: tos
Author-email: 
Maintainer: tos
Maintainer-email: 
Project-URL: Bug Reports, https://github.com/volcengine/tos-vectors-embed-cli/issues
Project-URL: Source, https://github.com/volcengine/tos-vectors-embed-cli
Project-URL: Documentation, https://github.com/volcengine/tos-vectors-embed-cli#readme
Project-URL: Homepage, https://github.com/volcengine/tos-vectors-embed-cli
Keywords: tos,vectors,embeddings,ark,cli,machine-learning,ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Utilities
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Environment :: Console
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: tos==2.8.8b2
Requires-Dist: volcengine-python-sdk[ark]
Requires-Dist: httpx
Requires-Dist: click>=8.0.0
Requires-Dist: rich>=12.0.0
Requires-Dist: pydantic>=1.10.0
Requires-Dist: scikit-learn
Requires-Dist: numpy
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: maintainer
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Volcengine TOS Vectors Embed CLI

Volcengine TOS Vectors Embed CLI is a standalone command-line tool that simplifies the process of working with vector embeddings in TOS Vectors. You can create vector embeddings for your data using Volcengine Ark and store and query them in your TOS vector index using single commands.

## Supported Commands

**tos-vectors-embed put**: Embed text, file content, or TOS objects and store them as vectors in a TOS vector index.
You can create and ingest vector embeddings into a TOS vector index using a single put command. You specify the data input you want to create an embedding for, a Volcengine Ark embeddings model ID, your TOS vector bucket name, and TOS vector index name. The command supports several input formats including text data, a local text or image file, a TOS image or text object or prefix. The command generates embeddings using the dimensions configured in your TOS vector index properties. If you are ingesting embeddings for several objects in a TOS prefix or local file path, it automatically uses batch processes to maximize throughput.

**Note**: Each file is processed as a single embedding. Document chunking is not currently supported.

**tos-vectors-embed query**: Embed a query input and search for similar vectors in a TOS vector index.
You can perform similarity queries for vector embeddings in your TOS vector index using a single query command. You specify your query input, a Volcengine Ark embeddings model ID, the vector bucket name, and vector index name. The command accepts several types of query inputs like a text string, an image file, or a single TOS text or image object. The command generates embeddings for your query using the input embeddings model and then performs a similarity search to find the most relevant matches. You can control the number of results returned, apply metadata filters to narrow your search, and choose whether to include similarity distance in the results for comprehensive analysis.

### Supported Input Types

**Note**: 
This CLI has introduced a unified `--ark-inference-params` parameter for all model-specific parameters.
Additionally, the query command uses the following separate parameters:

- **`--text-value`**: Direct text query string (preferred for text queries)
- **`--text`**: Text file path (local file or TOS URI)
- **`--image`**: Image file path (local file or TOS URI)
- **`--video`**: Video file path (local file or TOS URI)

## Installation and Configuration
### Prerequisites
- Python 3.9 or higher
- To execute the CLI, you will need Volcengine credentials configured. 
- Update your Volcengine account with appropriate permissions to use Volcengine Ark and TOS Vectors
- Access to a Volcengine Ark embedding model
- Create a Volcengine TOS vector bucket and vector index to store your embeddings

### Quick Install (Recommended)
```bash
pip install tos-vectors-embed-cli
```

### Development Install
```bash
# Clone the repository
git clone <repository-url>
cd tos-vectors-embed-cli

# Install in development mode
pip install -e .
```

**Note**: All dependencies are automatically installed when you install the package via pip.

### Quick Start

#### **Configure credentials**

1. Configure ARK API key from the environment variables:
```bash
export ARK_API_KEY="YOUR_ARK_API_KEY"
```

2. Configure TOS credentials from the environment variables:
```bash
export TOS_ACCESS_KEY="YOUR_TOS_ACCESS_KEY"
export TOS_SECRET_KEY="YOUR_TOS_SECRET_KEY"
export TOS_VECTOR_ENDPOINT="tosvectors-cn-beijing.volces.com" # Optional, defaults to cn-beijing
```

#### **Put Examples**

1. **Embed text and store them as vectors in your TOS vector index:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'Hello, world'
```

2. **Process local text files:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/sample.txt"
```

3. **Process files from a local file path using wildcard characters:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/*.txt"
```

4. **Process files from a TOS bucket using wildcard characters:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "tos://bucket/path/*"
```

5. **Process a single file from a TOS bucket:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --image "tos://bucket/images/photo.jpg"
```

6. **Process a local image file:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --image "./images/photo.jpg"
```

6. **Process image files from a local path using wildcard characters:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --image "./images/*.jpg"
```

7. **Process image files from a TOS bucket using wildcard characters:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --image "tos://bucket/images/*"
```

8. **Add metadata to your vectors:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'Sample text' \
  --metadata '{"category": "documentation", "version": "1.0"}'
```

9. **Use a custom vector key:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'Sample text' \
  --key "doc-001"
```

10. **Use filename as vector key:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/report.txt" \
  --filename-as-key
```

11. **Use key prefix with auto-generated UUIDs:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'Sample text' \
  --key-prefix "temp/"
```

12. **Use key prefix with custom key:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'Sample text' \
  --key "doc-001" \
  --key-prefix "project-a/"
```

13. **Use key prefix with filename:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/report.txt" \
  --filename-as-key \
  --key-prefix "docs/"
```

14. **Process a local video file:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --video "./videos/sample.mp4"
```

15. **Process a video file from a TOS bucket:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --video "tos://bucket/videos/sample.mp4"
```

16. **Multimodal input (Text + Image):**
**Note**: Multimodal input currently only supports one image and one text pair.
```bash
tos-vectors-embed \
  --account-id 12345678 \
  put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'A beautiful sunset over the mountains' \
  --image "./images/sunset.jpg"
```

#### **Query Examples**

1. **Direct text query:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'query text' \
  --k 20
```

2. **Query using a local text file:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/query.txt" \
  --k 20 \
  --output table
```

3. **Query using a TOS text file:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "tos://my-bucket/query.txt" \
  --k 20 
```

4. **Image query:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --image "./documents/image.jpg" \
  --k 20 
```

5. **Query using a TOS image file:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --image "tos://my-bucket/image.jpg" \
  --k 20 
```

6. **Query with metadata filter (Exact match):**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'query text' \
  --filter '{"category": {"$eq": "documentation"}}'
```

6. **Query with multiple filters (AND):**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'query text' \
  --filter '{"$and": [{"category": "tech"}, {"version": {"$gte": "1.0"}}]}'
```

7. **Video query:**
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --video "./videos/query.mp4"
```

8. **Multimodal query (Text + Image):**
**Note**: Multimodal query currently only supports one image and one text pair.
```bash
tos-vectors-embed \
  --account-id 12345678 \
  query \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'search query' \
  --image "./images/query.jpg"
```

### **Wildcard Character Support**

The CLI supports powerful wildcard characters in the input path for processing multiple files efficiently:

#### **Local Filesystem Patterns**

- **Basic wildcards**: `./data/*.txt` - all .txt files in data directory
- **Home directory**: `~/documents/*.md` - all .md files in user's documents
- **Recursive patterns**: `./docs/**/*.txt` - all .txt files recursively
- **Multiple extensions**: `./files/*.{txt,md,json}` - multiple file types
- **Question mark**: `./file?.txt` - single character wildcard

#### **TOS URI Patterns**

**Important**: TOS wildcards work with prefixes, not file extensions. Use `tos://bucket/path/*` not `tos://bucket/path/*.ext`.

**Examples:**
```bash
# Process all files under a TOS prefix
tos-vectors-embed put --vector-bucket-name bucket --index-name idx \
  --model-id doubao-embedding-vision-250615 --text "tos://bucket/path1/*"
```

### **Important Differences: Local vs TOS Wildcards**

**Local Filesystem Wildcards:**
- ✅ Support file extensions: `./data/*.txt`, `./docs/*.json`
- ✅ Support complex patterns: `./files/*.{txt,md}`, `./doc?.txt`
- ✅ Support recursive patterns: `./docs/**/*.md`

**TOS Wildcards:**
- ✅ Support prefix patterns: `tos://bucket/docs/*`, `tos://bucket/2024/reports/*`
- ❌ **Do NOT support extension filtering**: `tos://bucket/path/*.json` won't work
- ❌ **Do NOT support complex patterns**: Use prefix-based organization instead 

**Best Practices:**
- **For TOS**: Organize files by prefix/path structure: `tos://bucket/json-files/*`
- **For Local**: Use full wildcard capabilities: `./data/*.{json,txt}`

#### Global Options
- `--debug`: Enable debug mode with detailed logging for troubleshooting
- `--account-id`: Volcengine account id
- `--vectors-region`: TOS vectors bucket region name
- `--vectors-endpoint`: The domain names that other services can use to access TOS vectors bucket

#### Put Command Parameters
Required:
- `--vector-bucket-name`: Name of the TOS vector bucket 
- `--index-name`: Name of the vector index in your vector index to store the vector embeddings
- `--model-id`: Ark model ID to use for generating embeddings

Input Options (one required):
- `--text-value`: Direct text input to embed
- `--text`: Text input - supports multiple input types:
  - **Local file**: `./document.txt`
  - **Local files with wildcard characters**: `./data/*.txt`
  - **TOS object**: `tos://bucket/path/file.txt`
  - **TOS path with wildcard characters**: `tos://bucket/path/*`
- `--image`: Image input - supports multiple input types:
  - **Local file**: `./document.jpg`
  - **Local wildcard**: `./data/*.jpg`
  - **TOS object**: `tos://bucket/path/file.jpg`
  - **TOS path with wildcard characters**: `tos://bucket/path/*`
- `--video`: Video input (Local file)

Optional:
- `--region`: TOS region name (effective in TOS path mode)
- `--key`: Uniquely identifies each vector in the vector index (default: auto-generated UUID)
- `--key-prefix`: Prefix to prepend to all vector keys
- `--filename-as-key`: Use filename as vector key (mutually exclusive with --key)
- `--metadata`: Additional metadata associated with the vector; provided as JSON string
- `--ark-inference-params`: Model-specific parameters passed to Ark (JSON format)
- `--max-workers`: Maximum parallel workers for batch processing (default: 4)
- `--batch-size`: Number of vectors per TOS Vector put_vectors call (1-500, default: 500)
- `--output`: Output format (json or table, default: json)

#### Query Command Parameters

**Core Required Parameters:**
- `--vector-bucket-name`: Name of the TOS vector bucket
- `--index-name`: Name of the vector index 
- `--model-id`: Ark model ID to use for generating embeddings

**Query Input Parameters (One Required):**
- `--text-value`: Direct text query string
- `--text`: Text file path (local file or TOS URI)
- `--image`: Image file path (local file or TOS URI)
- `--video`: Video file path (local file)

**Optional Parameters:**
- `--region`: TOS region name
- `--k`: Number of results to return (default: 30)
- `--filter`: Filter expression for metadata-based filtering (JSON format)
- `--ark-inference-params`: Model-specific parameters passed to Ark (JSON format)
- `--return-metadata`: Include metadata in results (default: true)
- `--return-distance`: Include similarity distance scores
- `--output`: Output format (table or json, default: json)

## Metadata Filtering

### **Supported Operators**

#### **Comparison Operators**
- `$eq`: Equal to
- `$ne`: Not equal to  
- `$gt`: Greater than
- `$gte`: Greater than or equal to
- `$lt`: Less than
- `$lte`: Less than or equal to
- `$in`: Value in array
- `$nin`: Value not in array

#### **Logical Operators**
- `$and`: Logical AND (all conditions must be true)
- `$or`: Logical OR (at least one condition must be true)
- `$not`: Logical NOT (condition must be false)

### **Filter Examples**

#### **Single Condition Filters**
```bash
# Exact match
--filter '{"category": {"$eq": "documentation"}}'

# Not equal
--filter '{"status": {"$ne": "archived"}}'
```

## Vector Key Management

The CLI provides flexible options for managing vector keys:

- **Auto-Generated UUID (Default)**: If no key is provided, a random UUID is generated.
- **Custom Key (`--key`)**: Specify a unique identifier for each vector.
- **Object-Based Key (`--filename-as-key`)**: Use the filename (for local files) or object key (for TOS objects) as the vector key.
- **Key Prefix (`--key-prefix`)**: Prepend a string to all generated or provided keys.

## Metadata

The Volcengine TOS Vectors Embed CLI automatically adds standard metadata fields to help track and manage your vector embeddings. Understanding these fields is important for filtering and troubleshooting your vector data.

### Standard Metadata Fields

The CLI automatically adds the following metadata fields to every vector:

#### `TOS-VECTORS-EMBED-SRC-CONTENT`
- **Purpose**: Stores the original text content. 
- **Behavior**:
  - **Direct text input** (`--text-value`): Contains the actual text content
  - **Text files**: Contains the full text content of the file
  - **Image files**: N/A (images don't have textual content to store) 

**Examples**:
```bash
# Direct text - stores the actual text
--text-value 'Hello world'
# Metadata: {"TOS-VECTORS-EMBED-SRC-CONTENT": "Hello world"}

# Text file - stores file content
--text document.txt
# Metadata: {"TOS-VECTORS-EMBED-SRC-CONTENT": "Contents of document.txt..."}
```

#### `TOS-VECTORS-EMBED-SRC-LOCATION`
- **Purpose**: Tracks the original file location
- **Behavior**:
  - **Text files**: Contains the file path or TOS URI
  - **Image files**: Contains the file path or TOS URI
  - **Direct text**: Not added (no file involved)

**Examples**:
```bash
# Local text file
--text /path/to/document.txt
# Metadata: {
#   "TOS-VECTORS-EMBED-SRC-CONTENT": "File contents...",
#   "TOS-VECTORS-EMBED-SRC-LOCATION": "file:///path/to/document.txt"
# }

# TOS text file
--text tos://my-bucket/docs/file.txt
# Metadata: {
#   "TOS-VECTORS-EMBED-SRC-CONTENT": "File contents...",
#   "TOS-VECTORS-EMBED-SRC-LOCATION": "tos://my-bucket/docs/file.txt"
# }
```

#### `TOS-VECTORS-EMBED-SRC-CONTENT-TYPE`
- **Purpose**: Indicates the type of content (TEXT, IMAGE, VIDEO).

### Additional Metadata

You can add your own metadata using the `--metadata` parameter with JSON format:

```bash
tos-vectors-embed put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'Sample text' \
  --metadata '{"category": "documentation", "version": "1.0", "author": "team-a"}'
```

**Result**: Your metadata is merged with the standard metadata fields:
```json
{
  "TOS-VECTORS-EMBED-SRC-CONTENT": "Sample text",
  "TOS-VECTORS-EMBED-SRC-CONTENT-TYPE": "TEXT",
  "category": "documentation",
  "version": "1.0", 
  "author": "team-a"
}
```

## Batch Processing

The CLI supports efficient batch processing for multiple files using local and TOS wildcard paths.

### **Batch Processing Features**

- **Automatic batching**: Large datasets are automatically split into batches of 500 vectors
- **Parallel processing**: Configurable workers for concurrent processing
- **Error resilience**: Individual file failures don't stop batch processing
- **Performance optimization**: Efficient memory usage and API call batching

### **Processing Strategy by Content Type**

The CLI automatically selects the optimal processing strategy based on content type:

| Content Type | Processing Mode | API Used | Batch Strategy | Output |
|--------------|----------------|----------|----------------|---------|
| **Text**  | Sync  | Ark API | Parallel batch storage | Single vector per file |
| **Image** | Sync  | Ark API | Parallel batch storage | Single vector per file |
| **Video** | Sync  | Ark API | Per-file storage | Multiple vectors per file |

### **Batch Examples**

1. **Process local files with custom parallel workers:**
```bash
tos-vectors-embed --account-id 12345678 put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/*.txt" \
  --max-workers 8
```

2. **Process files with custom batch size for TOS storage:**
```bash
tos-vectors-embed --account-id 12345678 put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "tos://bucket/path/*" \
  --batch-size 100
```

### **Batch Processing Output**

**Text/Image Batch Output:**
```bash
tos-vectors-embed --account-id 12345678 put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text "./documents/*.txt"
```

**Output:**
```json
{
  "type": "streaming_batch",
  "bucket": "my-bucket",
  "index": "my-index",
  "model": "doubao-embedding-vision-250615",
  "contentType": "text",
  "totalFiles": 94,
  "processedFiles": 94,
  "failedFiles": 0,
  "totalVectors": 94,
  "vectorKeys": [
    "abc-123...",
    "def-456..."
  ]
}
```

## Troubleshooting

### **Use Debug Mode**

For detailed information about API calls and performance, use the `--debug` flag:

```bash
tos-vectors-embed --debug --account-id 12345678 put \
  --vector-bucket-name my-bucket \
  --index-name my-index \
  --model-id doubao-embedding-vision-250615 \
  --text-value 'test'
```

### **Common Issues**

1. **Credentials Not Found**: Ensure `ARK_API_KEY`, `TOS_ACCESS_KEY`, and `TOS_SECRET_KEY` are set in your environment.
2. **Invalid Vector Dimension**: The CLI automatically fetches the index dimension. Ensure your Ark model supports the dimension configured in your TOS index.
3. **Account ID Format**: The `--account-id` must be a numeric string.

## Model Compatibility

| Model | Type | Use Case |
|-------|------|----------|
| `doubao-embedding-vision-250615` | Multimodal (Text + Image) | Modern text and image embedding |
| `doubao-embedding-vision-251215` | Multimodal (Text + Image) | Advanced text and image embedding |

## Repository Structure
```
tos-vectors-embed-cli/
├── tos_vectors/                       # Main package directory
│   ├── cli.py                        # Main CLI entry point
│   ├── commands/                     # Command implementations
│   │   ├── embed_put.py              # Vector embedding and storage
│   │   └── embed_query.py            # Vector similarity search
│   ├── core/                         # Core functionality
│   │   ├── unified_processor.py      # Unified processing logic
│   │   ├── services.py               # Ark and TOS Vector services
│   │   └── streaming_batch_orchestrator.py  # Batch processing
│   └── utils/                        # Utility functions
│       ├── config.py                 # Configuration management
│       ├── models.py                 # Model definitions and capabilities
│       └── multimodal_helpers.py     # Multimodal processing helpers
├── setup.py                          # Package installation configuration
├── pyproject.toml                    # Modern Python packaging configuration
├── requirements.txt                  # Python dependencies
├── LICENSE                           # Apache 2.0 license
├── NOTICE                            # Attribution notices
```

## Acknowledgement

This project is derived from the [s3-vectors-embed-cli](https://github.com/awslabs/s3-vectors-embed-cli) project, which is licensed under the Apache License 2.0. We thank the original authors for their contributions to the open-source community.
