Home

Shared Edge Instance

Zero-ops, multi-tenant LLM gateway — top up and start calling every model


The Shared Edge Instance is the default deployment of Hydite Vtslx AO. There is no infrastructure to provision: claim an API key in the dashboard and you can immediately call every supported provider and model. All tenants share a multi-region AO gateway cluster operated by Hydite — we handle HA, autoscaling, security patches and cost optimisation for you.

When the shared instance is right#

Best fitNot the right fit
Solo devs, startups, AI app PoCsRegulated workloads requiring data residency
~M to ~10M tokens / monthSustained > 1k RPM peak per single instance
One key, many providersDedicated egress IPs or 99.99% SLA
Pay-as-you-go pricingHIPAA / SOC2 / China-MLPS-3 audit

If you fall in the right column jump to Dedicated Instance.

Architecture#

1
┌──────────────────────────────┐
2
Your App ── HTTPS │ Edge Gateway (Multi-Region) │
3
│ • TLS / Auth / Rate limit │
4
│ • Anomaly drop / Routing │
5
└──────────────┬───────────────┘
6
7
┌──────────────▼───────────────┐
8
│ AO Routing Engine │
9
│ • Provider routing & FB │
10
│ • Cost accounting │
11
│ • Semantic cache │
12
│ • Guardrails / Tools │
13
└──────────────┬───────────────┘
14
┌────────────────────┼────────────────────┐
15
OpenAI · Anthropic · Google · DeepSeek · Qwen · Zhipu · …
  • Edge Gateway terminates TLS, authenticates virtual keys, enforces global rate limits and runs anomaly drops.
  • AO Routing Engine is OpenAI-compatible, handles provider routing, automatic fallbacks, semantic caching and cost accounting.
  • A built-in high-speed cache plus a persistence layer keep keys, usage and quotas durable.
  • All nodes run on the edge region closest to the caller, TLS 1.3 + HTTP/2.

1. Three-step onboarding#

Step 1 — Get an API key#

The API Keys dashboard lists every issued virtual key. New users obtain their first key via:

  1. Activation Code — redeem a one-time code on the Activation Codes page (typical for partner / referral distribution).
  2. Credit Code — top up any USD amount on the Credit Codes page; usage is metered against the balance in real time.
  3. Plan subscription — subscribe to a Pro / Team plan on Plans for monthly bundled quota and higher limits.
  4. Enterprise invoice — sales-issued accounts can mint keys via the Workspaces screen.

Issued keys look like:

1
sk-hydite-3f2a8b9c0d1e4f5a6b7c8d9e0f1a2b3c

⚠️ The full key is shown only once at creation time — store it safely. Lost a key? Use Regenerate in the dashboard.

Step 2 — Pick a Channel (optional)#

Every account ships with a Default Channel wired to Hydite's curated, best-value model mix. Create extra channels in Channels when you need to:

  • Isolate dev / staging / prod traffic
  • Hand a tenant a custom model whitelist
  • Bill / route a specific workload separately

Each Channel maps to its own model bundle, so routing, rate limits and cost accounting are all scoped per channel.

Step 3 — Make your first call#

1
export HYDITE_API_KEY=sk-hydite-...
2
3
curl https://api.hydite.com/v1/chat/completions \
4
-H "Authorization: Bearer $HYDITE_API_KEY" \
5
-H "Content-Type: application/json" \
6
-d '{
7
"model": "claude-sonnet-4-5",
8
"messages": [{"role":"user","content":"hi"}]
9
}'

Or reuse the official OpenAI SDK (zero code changes):

1
from openai import OpenAI
2
client = OpenAI(
3
api_key=os.environ["HYDITE_API_KEY"],
4
base_url="https://api.hydite.com/v1",
5
)
6
resp = client.chat.completions.create(
7
model="deepseek-v3",
8
messages=[{"role": "user", "content": "ping"}],
9
)

Full endpoint catalogue → API Reference.

2. Quotas & billing#

The Billing + Revenue dashboards expose, in real time, the:

  • Spend / Budget drill-down by key / user / team / channel.
  • Token usage roll-up by model and time bucket (hour / day / month).
  • Top Models / Top Users leaderboards for fast cost attribution.

Every request is billed at prompt + completion token cost, computed by the AO routing engine against the latest provider price sheets and normalised to USD. Transactions are retained for at least 90 days and exposed via /spend/logs.

Quota dimensions#

You can attach any combination of the following limits to a key (or its parent team / org):

DimensionBehaviour
BudgetHard USD cap — exceeding returns 429 budget_exceeded
Soft BudgetAlert-only threshold
RPM / TPMRequests / tokens per minute
ModelsAllow / deny list
ExpiresAutomatic deactivation on a date
Allowed IPsEgress IP allow-list (leak protection)

Limits stack across the three-tier structure: Key ⊆ Team ⊆ Organization. The strictest layer wins.

3. Rate limiting & anomaly protection#

The shared instance ships several anti-abuse layers turned on by default:

  • Gateway layer — global sliding-window limits per IP / key (default 60 RPM, 1M TPM, configurable in the dashboard).

  • Routing layer — per-key RPM / TPM counters returning a standard 429:

    1
    { "error": { "type": "rate_limit_error", "code": "rpm_limit", "message": "Rate limit exceeded" } }
  • Anomaly Detection — the Anomaly dashboard auto-flags:

    • Spend spikes > 10× historical mean for a single key
    • Same key concurrently used from multiple regions / IPs (likely leak)
    • Suspicious token sizes (potential prompt injection / data exfil)
    • Sudden error-rate spikes (jailbreak attempt or upstream issue)

    A matching rule automatically pauses the key and emails an alert; one click to resume.

  • Guardrails — opt in PII redaction, content moderation and prompt-injection detection per key or channel via /guardrails/apply_guardrail.

4. Routing & resilience#

The AO routing engine enables the full multi-provider routing toolkit on shared:

  • Latency-based routing — pick the lowest P95 deployment.
  • Cost-based routing — pick the cheapest equivalent model.
  • Fallback chains — e.g. claude-sonnet-4-5claude-3-7-sonnetgpt-4o.
  • Retry policy — auto-retry idempotent 5xx (default 2x, exponential backoff).
  • Semantic cache — channel-level toggle; cache hits cost zero tokens.

Every routing decision, fallback and retry is logged to /spend/logs and aggregated in System Health.

5. Observability#

DashboardData sourcePurpose
Overview/spend/* + gateway metricsKPI overview (QPS / Spend / Latency / Error %)
System Health/health/* + per-component probesLive status of gateway · routing engine · cache · database
AnomalyCall logs + risk rulesAnomaly event stream
API Keys/key/listAll issued keys & status
ChannelsCustom model bundlesChannel list / default channel

For external SIEM / Grafana wiring use:

  • GET /metrics — Prometheus
  • GET /spend/logs — JSON call detail
  • Webhooks — configure under Profile → Notifications

6. Data compliance#

  • Shared edge does not persist prompt or completion bodies by default — only usage metadata.
  • Need full audit logs? Enable Logging per channel (independent encryption, separate bucket, configurable retention).
  • All transport is TLS 1.3 with HSTS on every host.
  • We do not train on your data, and the metadata: { "no_log": true } opt-out is forwarded to providers that honour it.

7. Shared vs Dedicated#

Shared EdgeDedicated
LocationHydite multi-region edgeSingle-tenant (cloud / on-prem)
Entry costPay-as-you-goMonthly base + usage
Custom routingDashboard presetsFull YAML control
Data residencyNot persisted100% customer-controlled
SLA99.9%99.95%+
Sweet spot< 1k sustained RPMAny scale

Upgrade path: Workspaces → Upgrade to Dedicated — your key and quotas migrate seamlessly with no code changes.

Next steps#