A powerful coding model with 32B parameters.
| Provider | Model | Quantization | Context | Max Output | Throughput | Latency | Uptime | Input Price | Output Price |
|---|---|---|---|---|---|---|---|---|---|
| DeepInfra | qwen/qwen-2.5-coder-32b-instruct | fp8 | 33K | 16K | 15.2 TPS | 0.85s | 99.5% | $0.060000 | $0.150000 |
| Lambda | qwen/qwen-2.5-coder-32b-instruct | bf16 | 33K | 33K | 12.8 TPS | 1.20s | 98.9% | $0.070000 | $0.160000 |
| Together | qwen/qwen-2.5-coder-32b-instruct | int8 | 33K | 8K | 18.5 TPS | 0.65s | 99.8% | $0.055000 | $0.140000 |
| Fireworks | qwen/qwen-2.5-coder-32b-instruct | fp16 | 33K | 16K | — | — | 97.2% | $0.080000 | $0.180000 |
Context Length: 32,768 tokens
Architecture: Transformer