跳转到内容

Inventory & health

此内容尚不支持你的语言。

Liveness probe

Returns 200 with a small JSON body when the worker is ready.

Connection refused or non-200 means aistack is down or still starting; consumers should surface “service unreachable” with a hint to start the dev server. This endpoint never blocks on backend health — it only confirms the FastAPI worker itself is alive.

Successful response.

List servable models

OpenAI-compatible model list. Lists only backends that will actually serve a request right now:

  • ASR entries are emitted for every provider whose ML library is importable in the running venv (pure import probe — no model weights are loaded).
  • The TTS entry is emitted only if the Qwen3-TTS upstream container responds to its /health probe.
  • LLM entries are aggregated from Ollama’s /api/tags when the daemon is reachable; an unreachable Ollama silently contributes zero entries rather than failing the whole listing.

Clients can read this once on startup to know which capabilities are available, build language-aware pickers, and skip a 503 round-trip on first call. The response shape is OpenAI-compatible plus the aistack extension fields capabilities, languages, supports_streaming, and is_routing_alias per entry — see the ModelEntry schema.

Inventory of every model the gateway can serve right now, across ASR / TTS / LLM, plus the ‘auto’ routing alias when at least one ASR backend is reachable.


Response shape for GET /health.

FieldTypeRequiredDescription
statusenum ('ok')yesAlways ‘ok’ when this endpoint responds. Connection refused / non-200 means the gateway is down.
versionstringyesaistack package version (PEP 440).

Response shape for GET /v1/models — OpenAI-compatible.

FieldTypeRequiredDescription
objectenum ('list')noAlways ‘list’ (OpenAI compatibility).
dataarray of ModelEntryyesOne entry per servable model + the ‘auto’ routing alias when at least one ASR backend is reachable.