Concurrent Streams
Concurrent streams are the foundation of S.O.T.A. SYSTEMS pricing. Understanding how they work helps you choose the right tier and optimize your usage.
What is a Concurrent Stream?
A concurrent stream is one active API request being processed at any given moment.
Think of streams like server threads in a connection pool:
- 3 streams = 3 available threads
- 10 streams = 10 available threads
- Each thread handles one request at a time
- When a thread completes, it picks up the next queued request
Key insight: Streams limit how many requests run simultaneously, not how many you can make total.
How Streams Work
Example: Solo Tier (3 Streams)
You have 3 concurrent streams. Here's what happens:
Scenario 1: Light Usage
Result: Both requests start immediately. You're using 2/3 streams.
Scenario 2: Peak Usage
Result: First 3 requests process immediately. Requests 4-5 wait in queue (max 60 seconds).
Scenario 3: Stream Frees Up
Result: Queued requests automatically start as streams become available.
Queue Behavior
When all your streams are busy, new requests enter a queue:
- Max wait time: 60 seconds
- Automatic processing: Requests start as soon as a stream is free
- No manual intervention: Queue management is automatic
- Timeout response: If wait exceeds 60s, you receive a
408 Request Timeout
HTTP Status Codes
200 OK- Request processed successfully202 Accepted- Request queued, will process soon408 Request Timeout- Queue wait exceeded 60 seconds429 Too Many Requests- Queue is full
Choosing the Right Tier
Solo (3 Streams) - Best for:
- Individual developers testing and building
- Low-traffic applications (< 100 requests/hour)
- Sequential workflows where requests happen one after another
- Development and staging environments
Example use case: Personal assistant chatbot with 50 users
Team (10 streams) - Best for:
- Production applications with moderate traffic
- Parallel processing needs (bulk operations, batch jobs)
- Small teams or agencies serving multiple clients
- Consistent traffic throughout the day
Example use case: SaaS product with 500-1000 active users
Platform (Custom) - Best for:
- High-traffic applications (1000+ requests/hour)
- Enterprise deployments with strict SLA requirements
- Resellers serving multiple downstream customers
- Burst traffic patterns requiring large stream counts
Example use case: AI API reseller or high-volume data processing
Optimizing Stream Usage
1. Match Request Patterns to Tier
If most of your requests are:
- Sequential (chatbot conversations): Solo tier works great
- Parallel (batch document processing): Consider Team or Platform
2. Handle Queue Timeouts Gracefully
3. Monitor Your Usage
Check your dashboard (available at launch) to see:
- Current stream usage: How many streams are active right now
- Peak usage times: When you hit your stream limit most often
- Queue wait times: Average time requests spend in queue
- Timeout rate: How often requests exceed 60s queue wait
4. Upgrade Signals
Consider upgrading to a higher tier if:
- Queue timeouts happen regularly (>5% of requests)
- Average queue wait time exceeds 10 seconds
- You consistently max out streams during business hours
- You need to process batches faster
Streams vs. Traditional Rate Limits
Traditional APIs limit you by:
S.O.T.A. SYSTEMS only limits:
- Concurrent streams (how many requests run simultaneously)
This means:
The only constraint is parallelism, not total volume.
Real-World Examples
Example 1: Chatbot Application
Setup: Solo tier (3 streams), 100 daily users, average 10 messages/user
Analysis:
- Total daily requests: 1,000
- Peak concurrency: ~2-3 simultaneous conversations
- Verdict: Solo tier is perfect. Requests are naturally sequential (users wait for responses).
Example 2: Document Processing Pipeline
Setup: Processing 1,000 PDFs, extracting summaries
Without streaming optimization:
With streaming optimization (Team tier, 10 streams):
Verdict: Team tier enables 10x faster processing through parallelism.
Example 3: API Reseller
Setup: Platform tier (e.g. 50 streams), serving 20 downstream customers
Analysis:
- Each customer might send 2-5 concurrent requests
- Peak load: 40-50 simultaneous requests across all customers
- Verdict: Platform tier with custom allocation matches your resale model.
Frequently Asked Questions
Does each API key get its own streams?
No. Streams are shared across all API keys in your account. If you have 3 streams and 2 API keys, they share the same 3-stream pool.
Can I buy extra streams without upgrading tiers?
Not currently. Stream counts are fixed per tier. We're considering add-on streams for Q2 2026.
What happens if I exceed 60s in the queue?
You receive a 408 Request Timeout error. Your application should retry the request or notify the user.
Do streaming responses (SSE) count as one stream?
Yes. A streaming response (like chat with stream: true) occupies one stream for its entire duration, from first token to last.
Can I see real-time stream usage?
Yes, your dashboard (available at launch) shows live stream usage and queue metrics.
Do failed requests count against my streams?
Failed requests (4xx/5xx errors) release their stream immediately after failing. You're only charged a stream for the duration of the actual processing attempt.