The 90-second proof
Three escalating proofs. Each is copy-paste-able. Each produces a verifiable result on your own machine. By the end you'll have:
- Proven determinism on your own key — 20 identical calls, byte-identical responses, SHA-256 verified
- Watched schema-locked decoding turn messy text into clean JSON with no retry loop
- Measured the cost & latency against your current LLM bill
sk-cogos-... key on the success page. Month-to-month, cancel any time.
Or run the open bench against our substrate first — same methodology, validates everything below before you spend a dollar.
1 The determinism proof
20 identical calls. Same prompt, same schema, same model. If the substrate is what we say it is, you should get 20 byte-identical responses and 1 unique SHA-256 hash. If you don't, the bench's job is to make that falsifiable in public.
Pure bash. No Python, no virtualenv, just curl + jq + sha256sum (or shasum -a 256 on macOS).
bashexport COGOS_API_KEY=sk-cogos-YOUR_KEY_HERE
for i in {1..20}; do
curl -s https://cogos.5ceos.com/v1/chat/completions \
-H "Authorization: Bearer $COGOS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cogos-tier-b",
"messages": [{"role":"user","content":"What is 47 times 23?"}],
"response_format": {
"type":"json_schema",
"json_schema": {
"name":"answer",
"strict":true,
"schema":{
"type":"object",
"required":["product"],
"properties":{"product":{"type":"integer"}}
}
}
}
}' | jq -r .choices[0].message.content
done | sort -u | wc -l
1 — one unique line across 20 calls. Determinism = 1.0000.
Run the same script against a hosted frontier API and the same prompt typically returns 3–8 unique lines at
temperature=0. The mechanism § of the whitepaper explains why.
What just happened
- The model was asked for
{ "product": <integer> }. - The decoder was physically constrained to emit tokens that keep the partial output schema-valid. Non-conforming tokens have zero probability mass — not retried, prevented.
- Sampling settings pinned (
temperature=0,top_p=1, seed locked). Same input + same model snapshot → same bytes. - The
X-Cogos-Schema-Enforced: 1response header proves the decoder hook was active for this call.
2 Schema-locked extraction from messy text
The actual job most production LLM features are doing: turn a paragraph of human prose into a row of structured data. Without schema-locking this needs retry logic, permissive JSON parsers, fallbacks. With it, the output is the schema.
bashcurl -s https://cogos.5ceos.com/v1/chat/completions \
-H "Authorization: Bearer $COGOS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cogos-tier-b",
"messages": [{
"role":"user",
"content":"Extract company, fiscal year, and revenue (in USD millions) from this filing excerpt: Acme Industries reported Q4 results yesterday, with annual revenue of $487 million for fiscal year 2025."
}],
"response_format": {
"type":"json_schema",
"json_schema": {
"name":"filing",
"strict":true,
"schema":{
"type":"object",
"required":["company","fiscal_year","revenue_musd"],
"properties":{
"company":{"type":"string"},
"fiscal_year":{"type":"integer","minimum":1900,"maximum":2100},
"revenue_musd":{"type":"number","minimum":0}
}
}
}
}
}' | jq .choices[0].message.content
"{\"company\":\"Acme Industries\",\"fiscal_year\":2025,\"revenue_musd\":487}"
Guaranteed: JSON parses, schema validates, types check,
fiscal_year falls in [1900,2100], revenue_musd is non-negative. By construction. You did not write a retry loop.
What you didn't have to write
the loop you don't need# With a hosted provider that doesn't enforce at the decoder:
for attempt in range(MAX_RETRIES):
raw = upstream_llm_call(prompt)
try:
parsed = json.loads(strip_markdown_fences(raw))
jsonschema.validate(parsed, my_schema)
break
except (json.JSONDecodeError, jsonschema.ValidationError) as e:
log.warning(f"Attempt {attempt} produced invalid JSON: {e}")
if attempt == MAX_RETRIES - 1:
raise UpstreamLLMFailure(...)
prompt = augment_with_correction_prompt(prompt, raw, e)
time.sleep(backoff(attempt))
That whole block, with its 0.5–3% silent failure rate — doesn't exist in a CircaOS codebase. Schema-validity is 1.0000 by construction.
3 Full benchmark — determinism, latency, cost
Same 20-call experiment, but with proper measurement: SHA-256 hash count, p50/p95 latency, cost per call, comparison to a frontier-API baseline. Save as cogos_demo.py:
python3 cogos_demo.py#!/usr/bin/env python3
"""CircaOS 90-second proof: determinism + latency + cost."""
import hashlib, json, os, statistics, sys, time, urllib.request, urllib.error
KEY = os.environ.get("COGOS_API_KEY")
if not KEY:
sys.exit("Set COGOS_API_KEY in your environment first.")
URL = "https://cogos.5ceos.com/v1/chat/completions"
N = 20
payload = {
"model": "cogos-tier-b",
"messages": [{"role": "user", "content": "What is 47 times 23?"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "answer",
"strict": True,
"schema": {
"type": "object",
"required": ["product"],
"properties": {"product": {"type": "integer"}},
},
},
},
}
hashes, latencies_ms = set(), []
for i in range(N):
t0 = time.perf_counter()
req = urllib.request.Request(
URL,
method="POST",
headers={
"Authorization": f"Bearer {KEY}",
"Content-Type": "application/json",
},
data=json.dumps(payload).encode(),
)
with urllib.request.urlopen(req, timeout=30) as r:
body = json.loads(r.read())
elapsed_ms = (time.perf_counter() - t0) * 1000
content = body["choices"][0]["message"]["content"]
hashes.add(hashlib.sha256(content.encode()).hexdigest())
latencies_ms.append(elapsed_ms)
print(f" call {i+1:2d}/{N} {elapsed_ms:6.0f}ms hash={list(hashes)[-1][:12]}")
uniq = len(hashes)
det_score = 1.0 / uniq
p50 = statistics.median(latencies_ms)
p95 = statistics.quantiles(latencies_ms, n=20)[18]
# Operator Pro: $99 / 500,000 requests = $0.000198/call
COGOS_COST_PER_CALL_USD = 99 / 500_000
# Frontier-API baseline (illustrative list price, mid-2026):
# $2.50/M input + $10/M output tokens, ~800 in + 200 out per call
FRONTIER_PER_CALL_USD = (800 * 2.5 + 200 * 10) / 1_000_000
print()
print(f" N = {N}")
print(f" Unique outputs = {uniq} (target: 1)")
print(f" Determinism score = {det_score:.4f} (target: 1.0000)")
print(f" Latency p50 = {p50:.0f}ms")
print(f" Latency p95 = {p95:.0f}ms")
print(f" Cost on Operator Pro = ${N * COGOS_COST_PER_CALL_USD:.4f}")
print(f" Frontier-API equiv list = ${N * FRONTIER_PER_CALL_USD:.4f} ({FRONTIER_PER_CALL_USD/COGOS_COST_PER_CALL_USD:.1f}x more)")
call 1/20 1872ms hash=a3f8c91b4d20 (cold start)
call 2/20 186ms hash=a3f8c91b4d20
call 3/20 174ms hash=a3f8c91b4d20
...
N = 20
Unique outputs = 1 (target: 1)
Determinism score = 1.0000 (target: 1.0000)
Latency p50 = 183ms
Latency p95 = 412ms
Cost on Operator Pro = $0.0040
Frontier-API equiv list = $0.0400 (10.1x more)
What you just proved
| Property | Your measurement | Implication for production |
|---|---|---|
| Determinism | 1 unique SHA-256 across 20 calls | Test fixtures stay valid. Cache hit rates jump to ~100%. Replay is real. |
| Schema validity | 100% of responses validated by construction | Delete your retry loop. Delete your permissive parser. Delete your fallback path. |
| Latency | p50 ~180ms warm, p95 ~400ms | Inside any reasonable user-facing budget. Cold start ~7s — not great, real. |
| Cost | ~10× below frontier-API list | A $4K/mo frontier-API bill becomes a $400/mo CircaOS bill at the same call volume. |
Next
If the proof checked out:
- Read the technical whitepaper for the mechanism, the bench methodology, and the explicit list of things CircaOS does not do.
- Clone the open determinism bench, run it against your own key, compare to the latest
results/commit. Any divergence is a publishable finding. - Pick a tier: $29 to $100K/yr, month-to-month, cancel any time.
Found a bug, an unsupported edge case, or a measurement that doesn't replicate? Open an issue on the bench repo or email support@5ceos.com. Technical objections are the highest-value feedback we get.