S.O.T.A. SYSTEMS

Concurrent Streams

Concurrent streams are the foundation of S.O.T.A. SYSTEMS pricing. Understanding how they work helps you choose the right tier and optimize your usage.

What is a Concurrent Stream?

A concurrent stream is one active API request being processed at any given moment.

Think of streams like server threads in a connection pool:

  • 3 streams = 3 available threads
  • 10 streams = 10 available threads
  • Each thread handles one request at a time
  • When a thread completes, it picks up the next queued request

Key insight: Streams limit how many requests run simultaneously, not how many you can make total.

How Streams Work

Example: Solo Tier (3 Streams)

You have 3 concurrent streams. Here's what happens:

Scenario 1: Light Usage

Request 1 → Stream 1 ✓ (processing)
Request 2 → Stream 2 ✓ (processing)

Result: Both requests start immediately. You're using 2/3 streams.

Scenario 2: Peak Usage

Request 1 → Stream 1 ✓ (processing)
Request 2 → Stream 2 ✓ (processing)
Request 3 → Stream 3 ✓ (processing)
Request 4 → Queue (waiting)
Request 5 → Queue (waiting)

Result: First 3 requests process immediately. Requests 4-5 wait in queue (max 60 seconds).

Scenario 3: Stream Frees Up

Stream 1 finishes → Request 4 starts immediately
Stream 2 finishes → Request 5 starts immediately

Result: Queued requests automatically start as streams become available.

Queue Behavior

When all your streams are busy, new requests enter a queue:

  • Max wait time: 60 seconds
  • Automatic processing: Requests start as soon as a stream is free
  • No manual intervention: Queue management is automatic
  • Timeout response: If wait exceeds 60s, you receive a 408 Request Timeout

HTTP Status Codes

  • 200 OK - Request processed successfully
  • 202 Accepted - Request queued, will process soon
  • 408 Request Timeout - Queue wait exceeded 60 seconds
  • 429 Too Many Requests - Queue is full

Choosing the Right Tier

Solo (3 Streams) - Best for:

  • Individual developers testing and building
  • Low-traffic applications (< 100 requests/hour)
  • Sequential workflows where requests happen one after another
  • Development and staging environments

Example use case: Personal assistant chatbot with 50 users

Team (10 streams) - Best for:

  • Production applications with moderate traffic
  • Parallel processing needs (bulk operations, batch jobs)
  • Small teams or agencies serving multiple clients
  • Consistent traffic throughout the day

Example use case: SaaS product with 500-1000 active users

Platform (Custom) - Best for:

  • High-traffic applications (1000+ requests/hour)
  • Enterprise deployments with strict SLA requirements
  • Resellers serving multiple downstream customers
  • Burst traffic patterns requiring large stream counts

Example use case: AI API reseller or high-volume data processing

Optimizing Stream Usage

1. Match Request Patterns to Tier

If most of your requests are:

  • Sequential (chatbot conversations): Solo tier works great
  • Parallel (batch document processing): Consider Team or Platform

2. Handle Queue Timeouts Gracefully

import OpenAI from 'openai';
 
const client = new OpenAI({
  baseURL: 'https://ai.sota.systems/v1',
  apiKey: process.env.SOTA_API_KEY,
  timeout: 65000, // Slightly longer than queue timeout
  maxRetries: 2, // Retry on timeout
});
 
try {
  const response = await client.chat.completions.create({
    model: 'llama-3.1-70b-instruct',
    messages: [{ role: 'user', content: 'Hello!' }],
  });
} catch (error) {
  if (error.status === 408) {
    console.log('Queue timeout - all streams busy. Retry or upgrade tier.');
  }
}

3. Monitor Your Usage

Check your dashboard (available at launch) to see:

  • Current stream usage: How many streams are active right now
  • Peak usage times: When you hit your stream limit most often
  • Queue wait times: Average time requests spend in queue
  • Timeout rate: How often requests exceed 60s queue wait

4. Upgrade Signals

Consider upgrading to a higher tier if:

  • Queue timeouts happen regularly (>5% of requests)
  • Average queue wait time exceeds 10 seconds
  • You consistently max out streams during business hours
  • You need to process batches faster

Streams vs. Traditional Rate Limits

Traditional APIs limit you by:

Requests per minute (e.g., 60 RPM)
Tokens per minute (e.g., 100K TPM)
Daily quotas (e.g., 10M tokens/day)

S.O.T.A. SYSTEMS only limits:

  • Concurrent streams (how many requests run simultaneously)

This means:

No daily/monthly limits
No token counting
No throttling based on usage
Predictable performance

The only constraint is parallelism, not total volume.

Real-World Examples

Example 1: Chatbot Application

Setup: Solo tier (3 streams), 100 daily users, average 10 messages/user

Analysis:

  • Total daily requests: 1,000
  • Peak concurrency: ~2-3 simultaneous conversations
  • Verdict: Solo tier is perfect. Requests are naturally sequential (users wait for responses).

Example 2: Document Processing Pipeline

Setup: Processing 1,000 PDFs, extracting summaries

Without streaming optimization:

// Sequential - takes forever
for (const pdf of pdfs) {
  await processDocument(pdf); // Each takes 10s = 10,000s total
}

With streaming optimization (Team tier, 10 streams):

// Parallel batches
const batches = chunk(pdfs, 10); // Batches of 10
for (const batch of batches) {
  await Promise.all(batch.map(pdf => processDocument(pdf))); // 10 at once
}
// Total time: 1,000s (10x faster)

Verdict: Team tier enables 10x faster processing through parallelism.

Example 3: API Reseller

Setup: Platform tier (e.g. 50 streams), serving 20 downstream customers

Analysis:

  • Each customer might send 2-5 concurrent requests
  • Peak load: 40-50 simultaneous requests across all customers
  • Verdict: Platform tier with custom allocation matches your resale model.

Frequently Asked Questions

Does each API key get its own streams?

No. Streams are shared across all API keys in your account. If you have 3 streams and 2 API keys, they share the same 3-stream pool.

Can I buy extra streams without upgrading tiers?

Not currently. Stream counts are fixed per tier. We're considering add-on streams for Q2 2026.

What happens if I exceed 60s in the queue?

You receive a 408 Request Timeout error. Your application should retry the request or notify the user.

Do streaming responses (SSE) count as one stream?

Yes. A streaming response (like chat with stream: true) occupies one stream for its entire duration, from first token to last.

Can I see real-time stream usage?

Yes, your dashboard (available at launch) shows live stream usage and queue metrics.

Do failed requests count against my streams?

Failed requests (4xx/5xx errors) release their stream immediately after failing. You're only charged a stream for the duration of the actual processing attempt.

Next Steps

On this page