Skip to content

TTS — text to speech

Proxy to Qwen3-TTS (DELETE)

Transparent reverse proxy for /v1/audio/* to the Qwen3-TTS-12Hz-0.6B-CustomVoice container.

Transparent. Request body, headers (minus hop-by-hop), and response body all flow through unchanged. aistack does not transcode audio, swap voices, or adapt OpenAI’s spec — what Qwen3-TTS returns is what the client receives. The OpenAI-compatible request/response schemas are documented authoritatively at https://platform.openai.com/docs/api-reference/audio.

GPU scheduling. Holds the global gateway GPU slot for the duration of the upstream call. The Qwen3-TTS container generates on the same physical GPU as in-process ASR / LLM workloads, so the slot represents “GPU is doing inference” regardless of which process owns the kernels. Concurrent requests get HTTP 503 with Retry-After.

Streaming. The upstream is consumed and forwarded chunk-by-chunk so multi-MB audio responses don’t buffer entirely in memory. Client disconnect propagates: aistack closes the upstream connection so the container can abort generation early.

Error mapping. ConnectError on the upstream → 503 with a hint to start the docker compose stack. Other httpx errors → 502 with the upstream exception type/message in the envelope.

NameInTypeRequiredDescription
pathpathstringyes

Upstream Qwen3-TTS response, forwarded verbatim. For POST /v1/audio/speech the body is raw audio bytes (content-type per OpenAI spec); other paths under /v1/audio/* (clone-voice / list-voices / etc.) preserve the upstream’s content-type and shape.

  • audio/mpeg → string
  • audio/wav → string
  • audio/opus → string
  • application/json → object

Qwen3-TTS upstream produced an unexpected error.

Either the GPU slot is busy serving another inference (gateway-level), or the Qwen3-TTS container is unreachable. The error envelope’s provider field distinguishes the two.

Proxy to Qwen3-TTS (GET)

Transparent reverse proxy for /v1/audio/* to the Qwen3-TTS-12Hz-0.6B-CustomVoice container.

Transparent. Request body, headers (minus hop-by-hop), and response body all flow through unchanged. aistack does not transcode audio, swap voices, or adapt OpenAI’s spec — what Qwen3-TTS returns is what the client receives. The OpenAI-compatible request/response schemas are documented authoritatively at https://platform.openai.com/docs/api-reference/audio.

GPU scheduling. Holds the global gateway GPU slot for the duration of the upstream call. The Qwen3-TTS container generates on the same physical GPU as in-process ASR / LLM workloads, so the slot represents “GPU is doing inference” regardless of which process owns the kernels. Concurrent requests get HTTP 503 with Retry-After.

Streaming. The upstream is consumed and forwarded chunk-by-chunk so multi-MB audio responses don’t buffer entirely in memory. Client disconnect propagates: aistack closes the upstream connection so the container can abort generation early.

Error mapping. ConnectError on the upstream → 503 with a hint to start the docker compose stack. Other httpx errors → 502 with the upstream exception type/message in the envelope.

NameInTypeRequiredDescription
pathpathstringyes

Upstream Qwen3-TTS response, forwarded verbatim. For POST /v1/audio/speech the body is raw audio bytes (content-type per OpenAI spec); other paths under /v1/audio/* (clone-voice / list-voices / etc.) preserve the upstream’s content-type and shape.

  • audio/mpeg → string
  • audio/wav → string
  • audio/opus → string
  • application/json → object

Qwen3-TTS upstream produced an unexpected error.

Either the GPU slot is busy serving another inference (gateway-level), or the Qwen3-TTS container is unreachable. The error envelope’s provider field distinguishes the two.

Proxy to Qwen3-TTS (POST)

Transparent reverse proxy for /v1/audio/* to the Qwen3-TTS-12Hz-0.6B-CustomVoice container.

Transparent. Request body, headers (minus hop-by-hop), and response body all flow through unchanged. aistack does not transcode audio, swap voices, or adapt OpenAI’s spec — what Qwen3-TTS returns is what the client receives. The OpenAI-compatible request/response schemas are documented authoritatively at https://platform.openai.com/docs/api-reference/audio.

GPU scheduling. Holds the global gateway GPU slot for the duration of the upstream call. The Qwen3-TTS container generates on the same physical GPU as in-process ASR / LLM workloads, so the slot represents “GPU is doing inference” regardless of which process owns the kernels. Concurrent requests get HTTP 503 with Retry-After.

Streaming. The upstream is consumed and forwarded chunk-by-chunk so multi-MB audio responses don’t buffer entirely in memory. Client disconnect propagates: aistack closes the upstream connection so the container can abort generation early.

Error mapping. ConnectError on the upstream → 503 with a hint to start the docker compose stack. Other httpx errors → 502 with the upstream exception type/message in the envelope.

NameInTypeRequiredDescription
pathpathstringyes

Upstream Qwen3-TTS response, forwarded verbatim. For POST /v1/audio/speech the body is raw audio bytes (content-type per OpenAI spec); other paths under /v1/audio/* (clone-voice / list-voices / etc.) preserve the upstream’s content-type and shape.

  • audio/mpeg → string
  • audio/wav → string
  • audio/opus → string
  • application/json → object

Qwen3-TTS upstream produced an unexpected error.

Either the GPU slot is busy serving another inference (gateway-level), or the Qwen3-TTS container is unreachable. The error envelope’s provider field distinguishes the two.


Wire format for every non-2xx response from aistack.

The shape is identical regardless of which endpoint produced the error, so consumers can write one error-handling helper and reuse it across capabilities.

FieldTypeRequiredDescription
errorErrorBodyyes