Metadata-Version: 2.4
Name: ginzaserver
Version: 0.1.0
Summary: HTTP Server for GiNZA - Japanese NLP Library
Home-page: https://github.com/oyahiroki/ginzaserver
Author: Hiroki Oya
Author-email: Hiroki Oya <oyahiroki@gmail.com>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/oyahiroki/ginzaserver
Project-URL: Repository, https://github.com/oyahiroki/ginzaserver
Keywords: ginza,spacy,japanese,nlp,http-server
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ginza>=5.2.0
Requires-Dist: ja-ginza
Provides-Extra: electra
Requires-Dist: ja-ginza-electra; extra == "electra"
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# ginzaserver

HTTP Server for GiNZA - Japanese NLP Library

A high-performance, multi-threaded HTTP server that provides REST API access to [GiNZA](https://megagonlabs.github.io/ginza/), a Japanese natural language processing library built on spaCy.

## Features

- 🚀 **Multi-threaded server** using `ThreadingMixIn` for concurrent request handling
- 🎯 **Dual model support**: Choose between `ja_ginza` (fast) or `ja_ginza_electra` (accurate)
- 🔥 **GPU acceleration** support for enhanced performance
- 📊 **Performance optimized** with list comprehensions and efficient memory management
- 🌐 **REST API** with both GET and POST endpoints
- 📝 **JSON response format** with detailed token analysis

## Installation

### Prerequisites

Python 3.8 or higher is required.

### Install GiNZA Models

Choose one or both models based on your needs:

```bash
# Fast model (recommended for production)
pip install -U ginza ja_ginza

# Accurate model (higher memory usage, ~16GB RAM recommended)
pip install -U ginza ja_ginza_electra
```

### Install ginzaserver

Install directly from GitHub:

```bash
pip install git+https://github.com/oyahiroki/ginzaserver
```

## Usage

### Running the Server

```bash
ginzaserver <port> <option>
```

**Parameters:**
- `port`: Port number to listen on (e.g., 8888)
- `option`: Model selection
  - `0`: Use `ja_ginza` (faster, 10-20ms per request)
  - `1`: Use `ja_ginza_electra` (more accurate, 40-50ms per request)

**Example:**

```bash
ginzaserver 8888 0
```

### API Endpoints

#### POST Request

Send JSON data with a `text` field:

```bash
curl -X POST -H "Content-Type: application/json" \
  -d '{"text":"今日はいい天気です"}' \
  http://localhost:8888/
```

#### GET Request

Pass text as a URL-encoded query parameter:

```bash
curl "http://localhost:8888/?text=%E4%BB%8A%E6%97%A5%E3%81%AF%E3%81%84%E3%81%84%E5%A4%A9%E6%B0%97%E3%81%A7%E3%81%99"
```

### Response Format

The server returns JSON with the following structure:

```json
{
  "type": "doc",
  "sents": [
    {
      "tokens": [
        {
          "i": 0,
          "orth": "今日",
          "tag": "名詞-普通名詞-副詞可能",
          "pos": "NOUN",
          "lemma": "今日",
          "head.i": 3,
          "dep": "obl"
        },
        ...
      ]
    }
  ]
}
```

**Token Fields:**
- `i`: Token index in the document
- `orth`: Original word form
- `tag`: Detailed part-of-speech tag
- `pos`: Universal part-of-speech tag
- `lemma`: Base form of the word
- `head.i`: Index of the syntactic head
- `dep`: Dependency relation

## Client Example

A sample client is included in `examples/ginzaclient.py`:

```python
import urllib.request
import json

url = 'http://localhost:8888'
method = 'POST'
headers = {'Content-Type': 'application/json'}

obj = {'text': '今日はいい天気です'}
requestbody = json.dumps(obj).encode('utf-8')

request = urllib.request.Request(url, data=requestbody, method=method, headers=headers)
with urllib.request.urlopen(request) as response:
    response_body = response.read().decode('utf-8')
    response = json.loads(response_body)
    print(json.dumps(response, indent=2, ensure_ascii=False))
```

## Running as Python Script

You can also run the server directly as a Python script:

```bash
python ginzaserver/ginzaserver.py 8888 0
```

## Performance Optimizations

Recent improvements include:

- ✅ List comprehensions for faster token processing
- ✅ Removed unnecessary `del` statements
- ✅ Direct JSON encoding without intermediate variables
- ✅ GPU acceleration support (automatically enabled if available)
- ✅ Removed unused imports

## GPU Support

The server automatically detects and enables GPU acceleration if available:

```python
if spacy.prefer_gpu():
    spacy.require_gpu()
```

For CUDA support, install the appropriate spaCy version:

```bash
# For CUDA 11.5
pip install -U spacy[cuda115]
```

## Uninstallation

```bash
pip uninstall ginzaserver
```

## Troubleshooting

### Memory Issues

If the server is killed due to out-of-memory errors, check system logs:

```bash
# Linux
dmesg -T | grep -E -i -B100 'killed process'

# Check available memory
free -h
```

Consider using the `ja_ginza` model (option 0) instead of `ja_ginza_electra` if memory is limited.

### WSL/Container Localhost Access

When running in WSL or containers, you may need to bind to `0.0.0.0` instead of `127.0.0.1` to accept external connections. Modify the `ip` variable in `ginzaserver.py` if needed.

## License

Apache License 2.0

## Author

Hiroki Oya (oyahiroki@gmail.com)

## Links

- [GiNZA Official Documentation](https://megagonlabs.github.io/ginza/)
- [GitHub Repository](https://github.com/oyahiroki/ginzaserver)
- [spaCy Documentation](https://spacy.io/)
