TTS — text to speech
DELETE /v1/audio/{path:path}
Section titled “DELETE /v1/audio/{path:path}”Proxy to Qwen3-TTS (DELETE)
Transparent reverse proxy for /v1/audio/* to the Qwen3-TTS-12Hz-0.6B-CustomVoice container.
Transparent. Request body, headers (minus hop-by-hop), and response body all flow through unchanged. aistack does not transcode audio, swap voices, or adapt OpenAI’s spec — what Qwen3-TTS returns is what the client receives. The OpenAI-compatible request/response schemas are documented authoritatively at https://platform.openai.com/docs/api-reference/audio.
GPU scheduling. Holds the global gateway GPU slot for the
duration of the upstream call. The Qwen3-TTS container generates on
the same physical GPU as in-process ASR / LLM workloads, so the slot
represents “GPU is doing inference” regardless of which process owns
the kernels. Concurrent requests get HTTP 503 with Retry-After.
Streaming. The upstream is consumed and forwarded chunk-by-chunk so multi-MB audio responses don’t buffer entirely in memory. Client disconnect propagates: aistack closes the upstream connection so the container can abort generation early.
Error mapping. ConnectError on the upstream → 503 with a hint to start the docker compose stack. Other httpx errors → 502 with the upstream exception type/message in the envelope.
Parameters
Section titled “Parameters”| Name | In | Type | Required | Description |
|---|---|---|---|---|
path | path | string | yes |
Responses
Section titled “Responses”Upstream Qwen3-TTS response, forwarded verbatim. For POST /v1/audio/speech the body is raw audio bytes (content-type per OpenAI spec); other paths under /v1/audio/* (clone-voice / list-voices / etc.) preserve the upstream’s content-type and shape.
audio/mpeg→ stringaudio/wav→ stringaudio/opus→ stringapplication/json→ object
Qwen3-TTS upstream produced an unexpected error.
application/json→ErrorEnvelope
Either the GPU slot is busy serving another inference (gateway-level), or the Qwen3-TTS container is unreachable. The error envelope’s provider field distinguishes the two.
application/json→ErrorEnvelope
GET /v1/audio/{path:path}
Section titled “GET /v1/audio/{path:path}”Proxy to Qwen3-TTS (GET)
Transparent reverse proxy for /v1/audio/* to the Qwen3-TTS-12Hz-0.6B-CustomVoice container.
Transparent. Request body, headers (minus hop-by-hop), and response body all flow through unchanged. aistack does not transcode audio, swap voices, or adapt OpenAI’s spec — what Qwen3-TTS returns is what the client receives. The OpenAI-compatible request/response schemas are documented authoritatively at https://platform.openai.com/docs/api-reference/audio.
GPU scheduling. Holds the global gateway GPU slot for the
duration of the upstream call. The Qwen3-TTS container generates on
the same physical GPU as in-process ASR / LLM workloads, so the slot
represents “GPU is doing inference” regardless of which process owns
the kernels. Concurrent requests get HTTP 503 with Retry-After.
Streaming. The upstream is consumed and forwarded chunk-by-chunk so multi-MB audio responses don’t buffer entirely in memory. Client disconnect propagates: aistack closes the upstream connection so the container can abort generation early.
Error mapping. ConnectError on the upstream → 503 with a hint to start the docker compose stack. Other httpx errors → 502 with the upstream exception type/message in the envelope.
Parameters
Section titled “Parameters”| Name | In | Type | Required | Description |
|---|---|---|---|---|
path | path | string | yes |
Responses
Section titled “Responses”Upstream Qwen3-TTS response, forwarded verbatim. For POST /v1/audio/speech the body is raw audio bytes (content-type per OpenAI spec); other paths under /v1/audio/* (clone-voice / list-voices / etc.) preserve the upstream’s content-type and shape.
audio/mpeg→ stringaudio/wav→ stringaudio/opus→ stringapplication/json→ object
Qwen3-TTS upstream produced an unexpected error.
application/json→ErrorEnvelope
Either the GPU slot is busy serving another inference (gateway-level), or the Qwen3-TTS container is unreachable. The error envelope’s provider field distinguishes the two.
application/json→ErrorEnvelope
POST /v1/audio/{path:path}
Section titled “POST /v1/audio/{path:path}”Proxy to Qwen3-TTS (POST)
Transparent reverse proxy for /v1/audio/* to the Qwen3-TTS-12Hz-0.6B-CustomVoice container.
Transparent. Request body, headers (minus hop-by-hop), and response body all flow through unchanged. aistack does not transcode audio, swap voices, or adapt OpenAI’s spec — what Qwen3-TTS returns is what the client receives. The OpenAI-compatible request/response schemas are documented authoritatively at https://platform.openai.com/docs/api-reference/audio.
GPU scheduling. Holds the global gateway GPU slot for the
duration of the upstream call. The Qwen3-TTS container generates on
the same physical GPU as in-process ASR / LLM workloads, so the slot
represents “GPU is doing inference” regardless of which process owns
the kernels. Concurrent requests get HTTP 503 with Retry-After.
Streaming. The upstream is consumed and forwarded chunk-by-chunk so multi-MB audio responses don’t buffer entirely in memory. Client disconnect propagates: aistack closes the upstream connection so the container can abort generation early.
Error mapping. ConnectError on the upstream → 503 with a hint to start the docker compose stack. Other httpx errors → 502 with the upstream exception type/message in the envelope.
Parameters
Section titled “Parameters”| Name | In | Type | Required | Description |
|---|---|---|---|---|
path | path | string | yes |
Responses
Section titled “Responses”Upstream Qwen3-TTS response, forwarded verbatim. For POST /v1/audio/speech the body is raw audio bytes (content-type per OpenAI spec); other paths under /v1/audio/* (clone-voice / list-voices / etc.) preserve the upstream’s content-type and shape.
audio/mpeg→ stringaudio/wav→ stringaudio/opus→ stringapplication/json→ object
Qwen3-TTS upstream produced an unexpected error.
application/json→ErrorEnvelope
Either the GPU slot is busy serving another inference (gateway-level), or the Qwen3-TTS container is unreachable. The error envelope’s provider field distinguishes the two.
application/json→ErrorEnvelope
Schemas
Section titled “Schemas”ErrorEnvelope {#schema-errorenvelope}
Section titled “ErrorEnvelope {#schema-errorenvelope}”Wire format for every non-2xx response from aistack.
The shape is identical regardless of which endpoint produced the error, so consumers can write one error-handling helper and reuse it across capabilities.
| Field | Type | Required | Description |
|---|---|---|---|
error | ErrorBody | yes |