Concurrent Streams

Concurrent streams are the foundation of S.O.T.A. SYSTEMS pricing. Understanding how they work helps you choose the right tier and optimize your usage.

What is a Concurrent Stream?

A concurrent stream is one active API request being processed at any given moment.

Think of streams like server threads in a connection pool:

3 streams = 3 available threads
10 streams = 10 available threads
Each thread handles one request at a time
When a thread completes, it picks up the next queued request

Key insight: Streams limit how many requests run simultaneously, not how many you can make total.

How Streams Work

Example: Solo Tier (3 Streams)

You have 3 concurrent streams. Here's what happens:

Scenario 1: Light Usage

Request 1 → Stream 1 ✓ (processing)
Request 2 → Stream 2 ✓ (processing)

Result: Both requests start immediately. You're using 2/3 streams.

Scenario 2: Peak Usage

Request 1 → Stream 1 ✓ (processing)
Request 2 → Stream 2 ✓ (processing)
Request 3 → Stream 3 ✓ (processing)
Request 4 → Queue (waiting)
Request 5 → Queue (waiting)

Result: First 3 requests process immediately. Requests 4-5 wait in queue (max 60 seconds).

Scenario 3: Stream Frees Up

Stream 1 finishes → Request 4 starts immediately
Stream 2 finishes → Request 5 starts immediately

Result: Queued requests automatically start as streams become available.

Queue Behavior

When all your streams are busy, new requests enter a queue:

Max wait time: 60 seconds
Automatic processing: Requests start as soon as a stream is free
No manual intervention: Queue management is automatic
Timeout response: If wait exceeds 60s, you receive a 408 Request Timeout

HTTP Status Codes

200 OK - Request processed successfully
202 Accepted - Request queued, will process soon
408 Request Timeout - Queue wait exceeded 60 seconds
429 Too Many Requests - Queue is full

Choosing the Right Tier

Solo (3 Streams) - Best for:

Individual developers testing and building
Low-traffic applications (< 100 requests/hour)
Sequential workflows where requests happen one after another
Development and staging environments

Example use case: Personal assistant chatbot with 50 users

Team (10 streams) - Best for:

Production applications with moderate traffic
Parallel processing needs (bulk operations, batch jobs)
Small teams or agencies serving multiple clients
Consistent traffic throughout the day

Example use case: SaaS product with 500-1000 active users

Platform (Custom) - Best for:

High-traffic applications (1000+ requests/hour)
Enterprise deployments with strict SLA requirements
Resellers serving multiple downstream customers
Burst traffic patterns requiring large stream counts

Example use case: AI API reseller or high-volume data processing

Optimizing Stream Usage

1. Match Request Patterns to Tier

If most of your requests are:

Sequential (chatbot conversations): Solo tier works great
Parallel (batch document processing): Consider Team or Platform

2. Handle Queue Timeouts Gracefully

import OpenAI from 'openai';
 
const client = new OpenAI({
  baseURL: 'https://ai.sota.systems/v1',
  apiKey: process.env.SOTA_API_KEY,
  timeout: 65000, // Slightly longer than queue timeout
  maxRetries: 2, // Retry on timeout
});
 
try {
  const response = await client.chat.completions.create({
    model: 'llama-3.1-70b-instruct',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
} catch (error) {
  if (error.status === 408) {
    console.log('Queue timeout - all streams busy. Retry or upgrade tier.');
  }
}

3. Monitor Your Usage

Check your dashboard (available at launch) to see:

Current stream usage: How many streams are active right now
Peak usage times: When you hit your stream limit most often
Queue wait times: Average time requests spend in queue
Timeout rate: How often requests exceed 60s queue wait

4. Upgrade Signals

Consider upgrading to a higher tier if:

Queue timeouts happen regularly (>5% of requests)
Average queue wait time exceeds 10 seconds
You consistently max out streams during business hours
You need to process batches faster

Streams vs. Traditional Rate Limits

Traditional APIs limit you by:

Requests per minute (e.g., 60 RPM)

Tokens per minute (e.g., 100K TPM)

Daily quotas (e.g., 10M tokens/day)

S.O.T.A. SYSTEMS only limits:

Concurrent streams (how many requests run simultaneously)

This means:

No daily/monthly limits

No token counting

No throttling based on usage

Predictable performance

The only constraint is parallelism, not total volume.

Real-World Examples

Example 1: Chatbot Application

Setup: Solo tier (3 streams), 100 daily users, average 10 messages/user

Analysis:

Total daily requests: 1,000
Peak concurrency: ~2-3 simultaneous conversations
Verdict: Solo tier is perfect. Requests are naturally sequential (users wait for responses).

Example 2: Document Processing Pipeline

Setup: Processing 1,000 PDFs, extracting summaries

Without streaming optimization:

// Sequential - takes forever
for (const pdf of pdfs) {
  await processDocument(pdf); // Each takes 10s = 10,000s total
}

With streaming optimization (Team tier, 10 streams):

// Parallel batches
const batches = chunk(pdfs, 10); // Batches of 10
for (const batch of batches) {
  await Promise.all(batch.map(pdf => processDocument(pdf))); // 10 at once
}
// Total time: 1,000s (10x faster)

Verdict: Team tier enables 10x faster processing through parallelism.

Example 3: API Reseller

Setup: Platform tier (e.g. 50 streams), serving 20 downstream customers

Analysis:

Each customer might send 2-5 concurrent requests
Peak load: 40-50 simultaneous requests across all customers
Verdict: Platform tier with custom allocation matches your resale model.

Frequently Asked Questions

Does each API key get its own streams?

No. Streams are shared across all API keys in your account. If you have 3 streams and 2 API keys, they share the same 3-stream pool.

Can I buy extra streams without upgrading tiers?

Not currently. Stream counts are fixed per tier. We're considering add-on streams for Q2 2026.

What happens if I exceed 60s in the queue?

You receive a 408 Request Timeout error. Your application should retry the request or notify the user.

Do streaming responses (SSE) count as one stream?

Yes. A streaming response (like chat with stream: true) occupies one stream for its entire duration, from first token to last.

Can I see real-time stream usage?

Yes, your dashboard (available at launch) shows live stream usage and queue metrics.

Do failed requests count against my streams?

Failed requests (4xx/5xx errors) release their stream immediately after failing. You're only charged a stream for the duration of the actual processing attempt.