Shared Edge Instance
Zero-ops, multi-tenant LLM gateway — top up and start calling every model
The Shared Edge Instance is the default deployment of Hydite Vtslx AO. There is no infrastructure to provision: claim an API key in the dashboard and you can immediately call every supported provider and model. All tenants share a multi-region AO gateway cluster operated by Hydite — we handle HA, autoscaling, security patches and cost optimisation for you.
When the shared instance is right#
| Best fit | Not the right fit |
|---|---|
| Solo devs, startups, AI app PoCs | Regulated workloads requiring data residency |
| ~M to ~10M tokens / month | Sustained > 1k RPM peak per single instance |
| One key, many providers | Dedicated egress IPs or 99.99% SLA |
| Pay-as-you-go pricing | HIPAA / SOC2 / China-MLPS-3 audit |
If you fall in the right column jump to Dedicated Instance.
Architecture#
1┌──────────────────────────────┐2 Your App ── HTTPS │ Edge Gateway (Multi-Region) │3 │ • TLS / Auth / Rate limit │4 │ • Anomaly drop / Routing │5 └──────────────┬───────────────┘6 │7 ┌──────────────▼───────────────┐8 │ AO Routing Engine │9 │ • Provider routing & FB │10 │ • Cost accounting │11 │ • Semantic cache │12 │ • Guardrails / Tools │13 └──────────────┬───────────────┘14 ┌────────────────────┼────────────────────┐15 OpenAI · Anthropic · Google · DeepSeek · Qwen · Zhipu · …- Edge Gateway terminates TLS, authenticates virtual keys, enforces global rate limits and runs anomaly drops.
- AO Routing Engine is OpenAI-compatible, handles provider routing, automatic fallbacks, semantic caching and cost accounting.
- A built-in high-speed cache plus a persistence layer keep keys, usage and quotas durable.
- All nodes run on the edge region closest to the caller, TLS 1.3 + HTTP/2.
1. Three-step onboarding#
Step 1 — Get an API key#
The API Keys dashboard lists every issued virtual key. New users obtain their first key via:
- Activation Code — redeem a one-time code on the Activation Codes page (typical for partner / referral distribution).
- Credit Code — top up any USD amount on the Credit Codes page; usage is metered against the balance in real time.
- Plan subscription — subscribe to a Pro / Team plan on Plans for monthly bundled quota and higher limits.
- Enterprise invoice — sales-issued accounts can mint keys via the Workspaces screen.
Issued keys look like:
1sk-hydite-3f2a8b9c0d1e4f5a6b7c8d9e0f1a2b3c⚠️ The full key is shown only once at creation time — store it safely. Lost a key? Use Regenerate in the dashboard.
Step 2 — Pick a Channel (optional)#
Every account ships with a Default Channel wired to Hydite's curated, best-value model mix. Create extra channels in Channels when you need to:
- Isolate dev / staging / prod traffic
- Hand a tenant a custom model whitelist
- Bill / route a specific workload separately
Each Channel maps to its own model bundle, so routing, rate limits and cost accounting are all scoped per channel.
Step 3 — Make your first call#
1export HYDITE_API_KEY=sk-hydite-...23curl https://api.hydite.com/v1/chat/completions \4 -H "Authorization: Bearer $HYDITE_API_KEY" \5 -H "Content-Type: application/json" \6 -d '{7 "model": "claude-sonnet-4-5",8 "messages": [{"role":"user","content":"hi"}]9 }'Or reuse the official OpenAI SDK (zero code changes):
1from openai import OpenAI2client = OpenAI(3 api_key=os.environ["HYDITE_API_KEY"],4 base_url="https://api.hydite.com/v1",5)6resp = client.chat.completions.create(7 model="deepseek-v3",8 messages=[{"role": "user", "content": "ping"}],9)Full endpoint catalogue → API Reference.
2. Quotas & billing#
The Billing + Revenue dashboards expose, in real time, the:
- Spend / Budget drill-down by key / user / team / channel.
- Token usage roll-up by model and time bucket (hour / day / month).
- Top Models / Top Users leaderboards for fast cost attribution.
Every request is billed at prompt + completion token cost, computed by the AO routing engine against the latest provider price sheets and normalised to USD. Transactions are retained for at least 90 days and exposed via /spend/logs.
Quota dimensions#
You can attach any combination of the following limits to a key (or its parent team / org):
| Dimension | Behaviour |
|---|---|
| Budget | Hard USD cap — exceeding returns 429 budget_exceeded |
| Soft Budget | Alert-only threshold |
| RPM / TPM | Requests / tokens per minute |
| Models | Allow / deny list |
| Expires | Automatic deactivation on a date |
| Allowed IPs | Egress IP allow-list (leak protection) |
Limits stack across the three-tier structure: Key ⊆ Team ⊆ Organization. The strictest layer wins.
3. Rate limiting & anomaly protection#
The shared instance ships several anti-abuse layers turned on by default:
-
Gateway layer — global sliding-window limits per IP / key (default 60 RPM, 1M TPM, configurable in the dashboard).
-
Routing layer — per-key RPM / TPM counters returning a standard
429:1{ "error": { "type": "rate_limit_error", "code": "rpm_limit", "message": "Rate limit exceeded" } } -
Anomaly Detection — the Anomaly dashboard auto-flags:
- Spend spikes > 10× historical mean for a single key
- Same key concurrently used from multiple regions / IPs (likely leak)
- Suspicious token sizes (potential prompt injection / data exfil)
- Sudden error-rate spikes (jailbreak attempt or upstream issue)
A matching rule automatically pauses the key and emails an alert; one click to resume.
-
Guardrails — opt in PII redaction, content moderation and prompt-injection detection per key or channel via
/guardrails/apply_guardrail.
4. Routing & resilience#
The AO routing engine enables the full multi-provider routing toolkit on shared:
- Latency-based routing — pick the lowest P95 deployment.
- Cost-based routing — pick the cheapest equivalent model.
- Fallback chains — e.g.
claude-sonnet-4-5→claude-3-7-sonnet→gpt-4o. - Retry policy — auto-retry idempotent 5xx (default 2x, exponential backoff).
- Semantic cache — channel-level toggle; cache hits cost zero tokens.
Every routing decision, fallback and retry is logged to /spend/logs and aggregated in System Health.
5. Observability#
| Dashboard | Data source | Purpose |
|---|---|---|
| Overview | /spend/* + gateway metrics | KPI overview (QPS / Spend / Latency / Error %) |
| System Health | /health/* + per-component probes | Live status of gateway · routing engine · cache · database |
| Anomaly | Call logs + risk rules | Anomaly event stream |
| API Keys | /key/list | All issued keys & status |
| Channels | Custom model bundles | Channel list / default channel |
For external SIEM / Grafana wiring use:
GET /metrics— PrometheusGET /spend/logs— JSON call detail- Webhooks — configure under Profile → Notifications
6. Data compliance#
- Shared edge does not persist prompt or completion bodies by default — only usage metadata.
- Need full audit logs? Enable Logging per channel (independent encryption, separate bucket, configurable retention).
- All transport is TLS 1.3 with HSTS on every host.
- We do not train on your data, and the
metadata: { "no_log": true }opt-out is forwarded to providers that honour it.
7. Shared vs Dedicated#
| Shared Edge | Dedicated | |
|---|---|---|
| Location | Hydite multi-region edge | Single-tenant (cloud / on-prem) |
| Entry cost | Pay-as-you-go | Monthly base + usage |
| Custom routing | Dashboard presets | Full YAML control |
| Data residency | Not persisted | 100% customer-controlled |
| SLA | 99.9% | 99.95%+ |
| Sweet spot | < 1k sustained RPM | Any scale |
Upgrade path: Workspaces → Upgrade to Dedicated — your key and quotas migrate seamlessly with no code changes.
Next steps#
- Browse every endpoint → API Reference
- Multi-environment isolation → API Key Groups
- Rate limit best practices → Rate Limiting
- Evaluate enterprise → Dedicated Instance