Metadata-Version: 2.4
Name: rebrandly_otel
Version: 0.5.0
Summary: Python OTEL wrapper by Rebrandly
Home-page: https://gitlab.rebrandly.com/rebrandly/instrumentation/rebrandly-otel-python
Author: Antonio Romano
Author-email: antonio@rebrandly.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: opentelemetry-api>=1.35.0
Requires-Dist: opentelemetry-sdk>=1.35.0
Requires-Dist: opentelemetry-exporter-otlp>=1.35.0
Requires-Dist: opentelemetry-semantic-conventions>=0.60b0
Requires-Dist: opentelemetry-instrumentation-redis>=0.60b0
Requires-Dist: opentelemetry-instrumentation-botocore>=0.48b0
Requires-Dist: psutil>=5.0.0
Provides-Extra: flask
Requires-Dist: flask>=2.0.0; extra == "flask"
Requires-Dist: werkzeug>=2.0.0; extra == "flask"
Provides-Extra: fastapi
Requires-Dist: fastapi>=0.118.0; extra == "fastapi"
Requires-Dist: starlette>=0.32.0; extra == "fastapi"
Requires-Dist: uvicorn>=0.20.0; extra == "fastapi"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# rebrandly-otel (Python)

OpenTelemetry SDK for Rebrandly Python services.

## Installation

```bash
pip install rebrandly-otel
```

## Protocol & Transport

### HTTP/Protobuf Only (Port 4318)

This SDK uses **HTTP/Protobuf protocol exclusively** for exporting telemetry data. gRPC support has been intentionally removed to ensure maximum compatibility and reliability.

#### Why Not gRPC?

gRPC was removed in version **0.4.16** due to critical build and runtime constraints:

**1. Dependency Conflicts**
- The `grpcio` package has severe version conflicts with `protobuf`
- `opentelemetry-exporter-otlp-proto-grpc` requires `protobuf < 6.0`
- Modern `grpcio` versions (1.72+) require `protobuf >= 6.30`
- These constraints are incompatible and cause dependency resolution failures
- Protobuf generated code must exactly match the runtime library version (by design)

**2. Binary Compatibility Issues**
- `grpcio` contains native C extensions that must be compiled for each:
  - Python version (3.7, 3.8, 3.9, 3.10, 3.11, 3.13)
  - Platform (Linux, macOS, Windows)
  - Architecture (x86_64, arm64)
- Pre-built wheels don't always match target environments
- Building from source requires compilation toolchain and matching protobuf versions

**3. AWS Lambda Runtime Constraints**
- Lambda functions freeze and thaw between invocations
- The SDK package must be built for the **exact same Python runtime** as the Lambda environment
- A package built with Python 3.11 will fail on Lambda Python 3.10 runtime
- Cross-platform builds (macOS → Linux Lambda) can cause binary incompatibilities
- These issues are eliminated with HTTP/Protobuf (no native dependencies)

**4. OpenTelemetry Recommendation**
- The OTLP specification recommends HTTP/Protobuf as the default protocol
- HTTP is simpler, more widely supported, and easier to debug
- Performance difference is negligible for most use cases

#### Default Port: 4318

When `OTEL_EXPORTER_OTLP_ENDPOINT` is set without a port, the SDK **automatically defaults to port 4318** (the standard HTTP/Protobuf port).

**Examples**:
- `http://collector:4318` → Used as-is
- `http://collector` → Automatically becomes `http://collector:4318`
- `https://otel.example.com` → Automatically becomes `https://otel.example.com:4318`

**Port Reference**:
- **4317**: gRPC protocol (not supported by this SDK)
- **4318**: HTTP/Protobuf protocol (used by this SDK)

#### Collector Configuration

Ensure your OpenTelemetry Collector is configured to accept HTTP/Protobuf on port 4318:

```yaml
receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
```

You do **not** need gRPC configuration for this SDK.

#### Migration from gRPC

If you're migrating from a gRPC-based setup:

1. Update collector to accept HTTP on port 4318 (usually already enabled)
2. Change endpoint to use port 4318 instead of 4317:
   ```bash
   # Before (gRPC)
   export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317

   # After (HTTP/Protobuf)
   export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318
   # or simply (port auto-added)
   export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector
   ```
3. Remove any `grpcio` or `opentelemetry-exporter-otlp-proto-grpc` dependencies
4. No code changes required - the SDK handles everything automatically

## Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `OTEL_SERVICE_NAME` | Yes | Service identifier |
| `OTEL_SERVICE_APPLICATION` | Yes | Application namespace (groups services) |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Yes | OTLP collector endpoint (HTTP/Protobuf, port 4318). See [Protocol & Transport](#protocol--transport) |
| `OTEL_REPO_NAME` | No | Repository name |
| `OTEL_COMMIT_ID` | No | Commit ID for version tracking |

## Lambda Handler

```python
from rebrandly_otel import lambda_handler, logger

@lambda_handler(name="my-function")
def handler(event, context):
    logger.info("Processing", extra={"event_id": event.get("id")})
    return {"statusCode": 200}
```

## AWS Message Handler

```python
from rebrandly_otel import aws_message_handler

@aws_message_handler(name="process-message")
def process_record(record):
    # trace context automatically extracted from message
    return {"success": True}
```

## Framework Middleware

### Flask

```python
from flask import Flask
from rebrandly_otel import otel, setup_flask

app = Flask(__name__)
setup_flask(otel, app)

@app.route('/api/users')
def get_users():
    return {"users": []}
```

### FastAPI

```python
from fastapi import FastAPI
from rebrandly_otel import otel, setup_fastapi

app = FastAPI()
setup_fastapi(otel, app)

@app.get('/api/users')
async def get_users():
    return {"users": []}
```

## Custom Instrumentation

### Manual Spans

```python
from rebrandly_otel import otel

with otel.span("operation-name", attributes={"user.id": user_id}):
    # your code
```

### Structured Logging

```python
from rebrandly_otel import logger

logger.info("Order processed", extra={"order_id": order_id, "amount": amount})
```

## HTTP Client Tracing

### Using requests

```python
from rebrandly_otel import requests_with_tracing

session = requests_with_tracing()
response = session.get('https://api.rebrandly.com/v1/links')
```

### Using httpx

```python
from rebrandly_otel import httpx_with_tracing

client = httpx_with_tracing()
response = client.get('https://api.rebrandly.com/v1/links')
```

### Manual Header Injection

```python
from rebrandly_otel import inject_traceparent

headers = {'Content-Type': 'application/json'}
inject_traceparent(headers)
# headers now includes traceparent
```

## Custom Metrics

```python
from rebrandly_otel import meter

# Counter
request_counter = meter.meter.create_counter(
    name='http.requests.total',
    description='Total HTTP requests'
)
request_counter.add(1, {'method': 'GET', 'endpoint': '/api/users'})

# Histogram
duration = meter.meter.create_histogram(
    name='http.request.duration',
    description='Request duration in ms',
    unit='ms'
)
duration.record(123, {'endpoint': '/api/users'})

# Gauge
gauge = meter.meter.create_gauge(
    name='queue.size',
    description='Current queue size'
)
gauge.record(42)
```

## Database Instrumentation

### PyMySQL

```python
import pymysql
from rebrandly_otel import otel, instrument_pymysql

connection = pymysql.connect(host='localhost', user='user', password='pass', database='db')
connection = instrument_pymysql(otel, connection)

# All queries now automatically traced
with connection.cursor() as cursor:
    cursor.execute("SELECT * FROM users WHERE id = %s", (123,))
```

### SQLite3

```python
import sqlite3
from rebrandly_otel import otel, instrument_sqlite3

# Create connection
connection = sqlite3.connect('database.db')  # or ':memory:'

# Instrument connection
connection = instrument_sqlite3(otel, connection, options={
    'slow_query_threshold_ms': 1000,
    'capture_bindings': False
})

# Use normally - all queries are traced
cursor = connection.cursor()
cursor.execute("SELECT * FROM users WHERE id = ?", (123,))

# SQLite also supports direct connection execution
connection.execute("CREATE TABLE test (id INTEGER)")
```

### Redis

Redis operations are automatically traced - just initialize the SDK:

```python
from rebrandly_otel import otel
import redis

otel.initialize()  # Redis instrumentation enabled automatically

client = redis.Redis(host='localhost', port=6379, db=0)
client.set('key', 'value')  # Automatically traced
```

**Note:** Unlike PyMySQL/SQLite3, Redis requires no explicit instrumentation call. All Redis clients (including async and cluster) are automatically traced when the SDK initializes.

## AWS Message Handling (SQS/SNS)

### Sending with Trace Context

```python
from rebrandly_otel import otel

trace_attrs = otel.tracer.get_attributes_for_aws_from_context()
sqs.send_message(QueueUrl=url, MessageBody=json.dumps(data), MessageAttributes=trace_attrs)
```

### Receiving with Context Extraction

```python
from rebrandly_otel import aws_message_span

with aws_message_span("process-message", message=record):
    # trace context automatically extracted
```

## Force Flush (Critical for Lambda)

```python
from rebrandly_otel import force_flush, shutdown

# Before Lambda exits
force_flush(timeout_millis=5000)
shutdown()
```

## Span Status Methods

```python
from rebrandly_otel import otel

otel.tracer.set_span_error("Operation failed")
otel.tracer.set_span_error("Failed", exception=e)
otel.tracer.set_span_success()
```

## Cost Optimization (Errors-Only Filtering)

For high-volume services, filter out successful spans to reduce costs by 90-99%:

```bash
export OTEL_SPAN_ATTRIBUTES="span.filter=errors-only"
```

This adds the filter attribute to all spans. The OTEL Gateway drops successful spans while keeping all errors. Metrics are still generated from 100% of traces at the agent level.

## Tips

- Always call `force_flush()` before Lambda exits
- Use `OTEL_DEBUG=true` for local debugging
- Keep metric cardinality low (< 1000 combinations)
- Add 2-3 seconds buffer to Lambda timeout for flush

## Troubleshooting

**No Data Exported:**
- Verify `OTEL_EXPORTER_OTLP_ENDPOINT` is set
- Enable `OTEL_DEBUG=true` for console output
- Check network connectivity to collector

**Missing Traces in Lambda:**
- Ensure `force_flush()` is called before exit
- Add 2-3s buffer to Lambda timeout
- Use `@lambda_handler` decorator with `auto_flush=True`

**Context Not Propagating:**
- Sending: Use `otel.tracer.get_attributes_for_aws_from_context()` for SQS/SNS
- HTTP: Use `inject_traceparent(headers)` before requests
- Receiving: Use `aws_message_span` context manager

**Wrong Port / Connection Refused:**
- This SDK uses HTTP/Protobuf protocol on port **4318** (not gRPC port 4317)
- If port is not specified, it defaults to 4318 automatically
- Verify your collector accepts HTTP on port 4318
- See [Protocol & Transport](#protocol--transport) for details

## Best Practices

**Do:**
- Use context managers for spans (auto-cleanup)
- Use meaningful span names (`fetch-user-profile`, not `handler`)
- Add business context (`order.id`, `user.id`) to spans
- Flush telemetry before Lambda exits
- Use bounded attribute values in metrics

**Don't:**
- Store large payloads in span attributes (< 1KB)
- Use high-cardinality attributes in metrics (`user_id`, `request_id`)
- Hardcode service names (use env vars)
- Skip error recording in except blocks
