Home

Shared Edge Instance

Zero-ops, multi-tenant LLM gateway — top up and start calling every model

The Shared Edge Instance is the default deployment of Hydite Vtslx AO. There is no infrastructure to provision: claim an API key in the dashboard and you can immediately call every supported provider and model. All tenants share a multi-region AO gateway cluster operated by Hydite — we handle HA, autoscaling, security patches and cost optimisation for you.

When the shared instance is right#

Best fit	Not the right fit
Solo devs, startups, AI app PoCs	Regulated workloads requiring data residency
~M to ~10M tokens / month	Sustained > 1k RPM peak per single instance
One key, many providers	Dedicated egress IPs or 99.99% SLA
Pay-as-you-go pricing	HIPAA / SOC2 / China-MLPS-3 audit

If you fall in the right column jump to Dedicated Instance.

Architecture#

1
┌──────────────────────────────┐
2
   Your App  ── HTTPS │   Edge Gateway (Multi-Region) │
3
                      │  • TLS / Auth / Rate limit   │
4
                      │  • Anomaly drop / Routing    │
5
                      └──────────────┬───────────────┘
6
                                     │
7
                      ┌──────────────▼───────────────┐
8
                      │       AO Routing Engine      │
9
                      │  • Provider routing & FB     │
10
                      │  • Cost accounting           │
11
                      │  • Semantic cache             │
12
                      │  • Guardrails / Tools        │
13
                      └──────────────┬───────────────┘
14
                ┌────────────────────┼────────────────────┐
15
           OpenAI · Anthropic · Google · DeepSeek · Qwen · Zhipu · …

Edge Gateway terminates TLS, authenticates virtual keys, enforces global rate limits and runs anomaly drops.
AO Routing Engine is OpenAI-compatible, handles provider routing, automatic fallbacks, semantic caching and cost accounting.
A built-in high-speed cache plus a persistence layer keep keys, usage and quotas durable.
All nodes run on the edge region closest to the caller, TLS 1.3 + HTTP/2.

1. Three-step onboarding#

Step 1 — Get an API key#

The API Keys dashboard lists every issued virtual key. New users obtain their first key via:

Activation Code — redeem a one-time code on the Activation Codes page (typical for partner / referral distribution).
Credit Code — top up any USD amount on the Credit Codes page; usage is metered against the balance in real time.
Plan subscription — subscribe to a Pro / Team plan on Plans for monthly bundled quota and higher limits.
Enterprise invoice — sales-issued accounts can mint keys via the Workspaces screen.

Issued keys look like:

1
sk-hydite-3f2a8b9c0d1e4f5a6b7c8d9e0f1a2b3c

⚠️ The full key is shown only once at creation time — store it safely. Lost a key? Use Regenerate in the dashboard.

Step 2 — Pick a Channel (optional)#

Every account ships with a Default Channel wired to Hydite's curated, best-value model mix. Create extra channels in Channels when you need to:

Isolate dev / staging / prod traffic
Hand a tenant a custom model whitelist
Bill / route a specific workload separately

Each Channel maps to its own model bundle, so routing, rate limits and cost accounting are all scoped per channel.

Step 3 — Make your first call#

1
export HYDITE_API_KEY=sk-hydite-...
2
3
curl https://api.hydite.com/v1/chat/completions \
4
  -H "Authorization: Bearer $HYDITE_API_KEY" \
5
  -H "Content-Type: application/json" \
6
  -d '{
7
    "model": "claude-sonnet-4-5",
8
    "messages": [{"role":"user","content":"hi"}]
9
  }'

Or reuse the official OpenAI SDK (zero code changes):

1
from openai import OpenAI
2
client = OpenAI(
3
    api_key=os.environ["HYDITE_API_KEY"],
4
    base_url="https://api.hydite.com/v1",
5
)
6
resp = client.chat.completions.create(
7
    model="deepseek-v3",
8
    messages=[{"role": "user", "content": "ping"}],
9
)

Full endpoint catalogue → API Reference.

2. Quotas & billing#

The Billing + Revenue dashboards expose, in real time, the:

Spend / Budget drill-down by key / user / team / channel.
Token usage roll-up by model and time bucket (hour / day / month).
Top Models / Top Users leaderboards for fast cost attribution.

Every request is billed at prompt + completion token cost, computed by the AO routing engine against the latest provider price sheets and normalised to USD. Transactions are retained for at least 90 days and exposed via /spend/logs.

Quota dimensions#

You can attach any combination of the following limits to a key (or its parent team / org):

Dimension	Behaviour
Budget	Hard USD cap — exceeding returns `429 budget_exceeded`
Soft Budget	Alert-only threshold
RPM / TPM	Requests / tokens per minute
Models	Allow / deny list
Expires	Automatic deactivation on a date
Allowed IPs	Egress IP allow-list (leak protection)

Limits stack across the three-tier structure: Key ⊆ Team ⊆ Organization. The strictest layer wins.

3. Rate limiting & anomaly protection#

The shared instance ships several anti-abuse layers turned on by default:

Gateway layer — global sliding-window limits per IP / key (default 60 RPM, 1M TPM, configurable in the dashboard).

Routing layer — per-key RPM / TPM counters returning a standard 429:

1
{ "error": { "type": "rate_limit_error", "code": "rpm_limit", "message": "Rate limit exceeded" } }

Anomaly Detection — the Anomaly dashboard auto-flags:
- Spend spikes > 10× historical mean for a single key
- Same key concurrently used from multiple regions / IPs (likely leak)
- Suspicious token sizes (potential prompt injection / data exfil)
- Sudden error-rate spikes (jailbreak attempt or upstream issue)
A matching rule automatically pauses the key and emails an alert; one click to resume.
Guardrails — opt in PII redaction, content moderation and prompt-injection detection per key or channel via /guardrails/apply_guardrail.

4. Routing & resilience#

The AO routing engine enables the full multi-provider routing toolkit on shared:

Latency-based routing — pick the lowest P95 deployment.
Cost-based routing — pick the cheapest equivalent model.
Fallback chains — e.g. claude-sonnet-4-5 → claude-3-7-sonnet → gpt-4o.
Retry policy — auto-retry idempotent 5xx (default 2x, exponential backoff).
Semantic cache — channel-level toggle; cache hits cost zero tokens.

Every routing decision, fallback and retry is logged to /spend/logs and aggregated in System Health.

5. Observability#

Dashboard	Data source	Purpose
Overview	`/spend/*` + gateway metrics	KPI overview (QPS / Spend / Latency / Error %)
System Health	`/health/*` + per-component probes	Live status of gateway · routing engine · cache · database
Anomaly	Call logs + risk rules	Anomaly event stream
API Keys	`/key/list`	All issued keys & status
Channels	Custom model bundles	Channel list / default channel

For external SIEM / Grafana wiring use:

GET /metrics — Prometheus
GET /spend/logs — JSON call detail
Webhooks — configure under Profile → Notifications

6. Data compliance#

Shared edge does not persist prompt or completion bodies by default — only usage metadata.
Need full audit logs? Enable Logging per channel (independent encryption, separate bucket, configurable retention).
All transport is TLS 1.3 with HSTS on every host.
We do not train on your data, and the metadata: { "no_log": true } opt-out is forwarded to providers that honour it.

7. Shared vs Dedicated#

	Shared Edge	Dedicated
Location	Hydite multi-region edge	Single-tenant (cloud / on-prem)
Entry cost	Pay-as-you-go	Monthly base + usage
Custom routing	Dashboard presets	Full YAML control
Data residency	Not persisted	100% customer-controlled
SLA	99.9%	99.95%+
Sweet spot	< 1k sustained RPM	Any scale

Upgrade path: Workspaces → Upgrade to Dedicated — your key and quotas migrate seamlessly with no code changes.

Next steps#

Browse every endpoint → API Reference
Multi-environment isolation → API Key Groups
Rate limit best practices → Rate Limiting
Evaluate enterprise → Dedicated Instance

Language

Is this helpful?

AI Tools

Ask ChatGPT Ask Claude