2026-04-27 11:07:21 +08:00
|
|
|
|
# Qwen3-TTS Docker 快速部署文档
|
|
|
|
|
|
|
|
|
|
|
|
## 模型版本选择
|
|
|
|
|
|
|
|
|
|
|
|
**当前部署模型:Qwen3-TTS-24Hz-1.7B-Base-VoiceClone**
|
|
|
|
|
|
|
|
|
|
|
|
这是 Qwen3-TTS 系列中**功能最全面、音质最高**的模型版本,支持高质量声音克隆。
|
|
|
|
|
|
|
|
|
|
|
|
### 模型对比表
|
|
|
|
|
|
|
|
|
|
|
|
| 模型名称 | 采样率 | 参数量 | 体积 | 音质 | 速度 | 特殊功能 | 适用场景 |
|
|
|
|
|
|
|---------|--------|--------|------|------|------|---------|---------|
|
|
|
|
|
|
| Qwen3-TTS-12Hz-0.6B-CustomVoice | 12kHz | 0.6B | 1.7GB | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 9种预设声音 | 性能优先 |
|
|
|
|
|
|
| Qwen3-TTS-12Hz-1.7B-CustomVoice | 12kHz | 1.7B | ~4GB | ⭐⭐⭐⭐ | ⭐⭐⭐ | 9种预设声音 | 需要更高音质 |
|
|
|
|
|
|
| Qwen3-TTS-24Hz-0.6B-CustomVoice | 24kHz | 0.6B | ~2GB | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 9种预设声音 | 高质量需求 |
|
|
|
|
|
|
| Qwen3-TTS-24Hz-1.7B-CustomVoice | 24kHz | 1.7B | ~5GB | ⭐⭐⭐⭐⭐ | ⭐⭐ | 9种预设声音 | 最高音质(无克隆) |
|
|
|
|
|
|
| Qwen3-TTS-12Hz-Base-VoiceClone | 12kHz | 0.6B | ~2GB | ⭐⭐⭐ | ⭐⭐⭐⭐ | 声音克隆 | 自定义声音 |
|
|
|
|
|
|
| **Qwen3-TTS-24Hz-1.7B-Base-VoiceClone** | **24kHz** | **1.7B** | **~6.8GB** | **⭐⭐⭐⭐⭐** | **⭐⭐** | **声音克隆** | **功能最全面** |
|
|
|
|
|
|
|
|
|
|
|
|
**当前模型特点:**
|
|
|
|
|
|
- **24kHz 采样率**:双倍于 12Hz 模型,音质更清晰自然
|
|
|
|
|
|
- **1.7B 参数**:模型表达能力最强
|
|
|
|
|
|
- **声音克隆**:支持自定义声音训练和生成
|
|
|
|
|
|
- **功能最全面**:兼具基础模型 + 声音克隆功能
|
|
|
|
|
|
|
|
|
|
|
|
**推荐方案:**
|
|
|
|
|
|
- **默认当前选择**:`Qwen3-TTS-24Hz-1.7B-Base-VoiceClone`(功能最全面 + 最高音质)
|
|
|
|
|
|
- **性能优先**:`Qwen3-TTS-12Hz-0.6B-CustomVoice`
|
|
|
|
|
|
- **高质量(无克隆)**:`Qwen3-TTS-24Hz-1.7B-CustomVoice`
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 快速开始(3 步部署)
|
|
|
|
|
|
|
|
|
|
|
|
### 步骤 1:下载模型
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 进入工作目录
|
|
|
|
|
|
cd ~/Qwen3-TTS
|
|
|
|
|
|
mkdir -p model
|
|
|
|
|
|
cd model
|
|
|
|
|
|
|
|
|
|
|
|
# 安装 ModelScope CLI(国内用户推荐)
|
|
|
|
|
|
python3 -m pip install -U modelscope
|
|
|
|
|
|
|
|
|
|
|
|
# 下载 Tokenizer(约 651MB)
|
|
|
|
|
|
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-24Hz --local_dir ./Qwen3-TTS-Tokenizer-24Hz
|
|
|
|
|
|
|
|
|
|
|
|
# 下载 Qwen3-TTS-24Hz-1.7B-Base-VoiceClone 模型(约 6.8GB)
|
|
|
|
|
|
# 这是功能最全面、音质最高的模型,支持声音克隆
|
|
|
|
|
|
modelscope download --model Qwen/Qwen3-TTS-24Hz-1.7B-Base-VoiceClone --local_dir ./Qwen3-TTS-24Hz-1.7B-Base-VoiceClone
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**说明:**
|
|
|
|
|
|
- **ModelScope**:阿里云模型托管平台,国内下载速度快
|
|
|
|
|
|
- **Tokenizer**:分词器,将文本转换为模型能理解的 token(24Hz 版本)
|
|
|
|
|
|
- **TTS 模型**:核心模型,1.7B 参数 + 24Hz 采样率 + 声音克隆功能
|
|
|
|
|
|
|
|
|
|
|
|
### 步骤 2:创建文件
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd ~/Qwen3-TTS
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 创建 app.py(已配置为 24Hz-1.7B-Base-VoiceClone 模型)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cat > app.py << 'EOF'
|
|
|
|
|
|
from fastapi import FastAPI, Body
|
|
|
|
|
|
import soundfile as sf
|
|
|
|
|
|
import io
|
|
|
|
|
|
import torch
|
|
|
|
|
|
import sys
|
|
|
|
|
|
import base64
|
|
|
|
|
|
|
|
|
|
|
|
# 添加路径以支持导入
|
|
|
|
|
|
sys.path.insert(0, '/app')
|
|
|
|
|
|
|
|
|
|
|
|
from qwen_tts import Qwen3TTSModel
|
|
|
|
|
|
|
|
|
|
|
|
# ========== 当前配置:Qwen3-TTS-24Hz-1.7B-Base-VoiceClone ==========
|
|
|
|
|
|
# 功能最全面、音质最高的模型,支持声音克隆
|
|
|
|
|
|
# - 24kHz 采样率:音质更清晰自然
|
|
|
|
|
|
# - 1.7B 参数:模型表达能力最强
|
|
|
|
|
|
# - Base-VoiceClone:支持自定义声音克隆
|
|
|
|
|
|
MODEL_PATH = '/app/model/Qwen3-TTS-24Hz-1.7B-Base-VoiceClone'
|
|
|
|
|
|
TOKENIZER_PATH = '/app/model/Qwen3-TTS-Tokenizer-24Hz'
|
|
|
|
|
|
|
|
|
|
|
|
# 如需切换到其他模型,请修改以下路径并重新构建镜像
|
|
|
|
|
|
# ==========================================================
|
|
|
|
|
|
|
|
|
|
|
|
app = FastAPI(title="Qwen3-TTS API")
|
|
|
|
|
|
|
|
|
|
|
|
# 全局模型
|
|
|
|
|
|
model = None
|
|
|
|
|
|
|
|
|
|
|
|
@app.on_event("startup")
|
|
|
|
|
|
async def startup_event():
|
|
|
|
|
|
global model
|
|
|
|
|
|
print("正在加载 TTS 模型...")
|
|
|
|
|
|
print(f"模型路径: {MODEL_PATH}")
|
|
|
|
|
|
print(f"模型版本: Qwen3-TTS-24Hz-1.7B-Base-VoiceClone")
|
|
|
|
|
|
print("提示: 1.7B 模型加载需要较长时间,请耐心等待...")
|
|
|
|
|
|
model = Qwen3TTSModel.from_pretrained(
|
|
|
|
|
|
MODEL_PATH,
|
|
|
|
|
|
tokenizer_path=TOKENIZER_PATH,
|
|
|
|
|
|
device_map='cpu',
|
|
|
|
|
|
dtype=torch.float32
|
|
|
|
|
|
)
|
|
|
|
|
|
print("TTS 模型初始化完成")
|
|
|
|
|
|
|
|
|
|
|
|
@app.get("/")
|
|
|
|
|
|
async def root():
|
|
|
|
|
|
return {
|
|
|
|
|
|
"status": "running",
|
|
|
|
|
|
"mode": "production",
|
|
|
|
|
|
"model": "Qwen3-TTS-24Hz-1.7B-Base-VoiceClone",
|
|
|
|
|
|
"features": ["voice_clone", "high_quality", "24khz"]
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
@app.post("/tts")
|
|
|
|
|
|
async def tts(text: str = Body(..., media_type='application/json')):
|
|
|
|
|
|
"""
|
|
|
|
|
|
注意:必须使用 Body(...) 解析请求体,否则会返回 422 错误
|
|
|
|
|
|
请求格式: POST /tts,Body 为 JSON 字符串 "文本内容"
|
|
|
|
|
|
|
|
|
|
|
|
超时设置:长文本推理可能需要较长时间,建议客户端设置超时时间 > 120 秒
|
|
|
|
|
|
CPU 推理速度:约 0.5-2 秒/字符(1.7B 模型较慢,比 0.6B 模型慢约 2-3 倍)
|
|
|
|
|
|
"""
|
|
|
|
|
|
if not model:
|
|
|
|
|
|
return {"code": 500, "msg": "模型未初始化"}
|
|
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
|
print(f"收到TTS请求,文本长度: {len(text)} 字符")
|
|
|
|
|
|
wavs, sr = model.generate_custom_voice(text, speaker='serena')
|
|
|
|
|
|
print(f"音频生成完成,采样率: {sr}, 音频时长: {len(wavs[0])/sr:.2f}秒")
|
|
|
|
|
|
|
|
|
|
|
|
# 转换为 WAV 格式
|
|
|
|
|
|
buffer = io.BytesIO()
|
|
|
|
|
|
sf.write(buffer, wavs[0], sr, format='WAV')
|
|
|
|
|
|
buffer.seek(0)
|
|
|
|
|
|
|
|
|
|
|
|
audio_data = buffer.read()
|
|
|
|
|
|
audio_b64 = base64.b64encode(audio_data).decode('utf-8')
|
|
|
|
|
|
print(f"编码完成,base64 长度: {len(audio_b64)}")
|
|
|
|
|
|
|
|
|
|
|
|
return {
|
|
|
|
|
|
"code": 0,
|
|
|
|
|
|
"msg": "success",
|
|
|
|
|
|
"text": text,
|
|
|
|
|
|
"audio": audio_b64
|
|
|
|
|
|
}
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
import traceback
|
|
|
|
|
|
print(f"Error: {traceback.format_exc()}")
|
|
|
|
|
|
return {"code": 500, "msg": f"TTS处理错误: {str(e)}"}
|
|
|
|
|
|
EOF
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 创建 requirements.txt
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cat > requirements.txt << 'EOF'
|
|
|
|
|
|
fastapi>=0.104.0
|
|
|
|
|
|
uvicorn>=0.24.0
|
|
|
|
|
|
numpy>=1.24.0
|
|
|
|
|
|
torch>=2.0.0
|
|
|
|
|
|
librosa>=0.10.0
|
|
|
|
|
|
soundfile>=0.12.0
|
|
|
|
|
|
safetensors>=0.4.0
|
|
|
|
|
|
qwen-tts>=0.1.0
|
|
|
|
|
|
EOF
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
#### 创建 Dockerfile
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-06-08 13:39:20 +08:00
|
|
|
|
cat > Dockerfile.bak << 'EOF'
|
2026-04-27 11:07:21 +08:00
|
|
|
|
FROM python:3.12-slim
|
|
|
|
|
|
|
|
|
|
|
|
WORKDIR /app
|
|
|
|
|
|
|
|
|
|
|
|
# 安装系统依赖
|
|
|
|
|
|
RUN apt-get update && apt-get install -y \
|
|
|
|
|
|
git \
|
|
|
|
|
|
ffmpeg \
|
|
|
|
|
|
libsndfile1 \
|
|
|
|
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
|
|
|
|
|
|
|
|
# 安装 qwen-tts(包含 qwen_tts 模块)
|
|
|
|
|
|
RUN pip install --no-cache-dir qwen-tts
|
|
|
|
|
|
|
|
|
|
|
|
# 复制应用文件
|
|
|
|
|
|
COPY app.py .
|
|
|
|
|
|
COPY requirements.txt .
|
|
|
|
|
|
|
|
|
|
|
|
# 安装额外依赖
|
|
|
|
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
|
|
|
|
|
|
|
|
EXPOSE 8000
|
|
|
|
|
|
|
|
|
|
|
|
# 增加超时设置(180秒,1.7B 模型需要更长时间)
|
|
|
|
|
|
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--timeout-keep-alive", "180", "--limit-concurrency", "1"]
|
|
|
|
|
|
EOF
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 步骤 3:构建并启动
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd ~/Qwen3-TTS
|
|
|
|
|
|
|
|
|
|
|
|
# 构建镜像
|
|
|
|
|
|
docker build -t qwen3-tts:latest .
|
|
|
|
|
|
|
|
|
|
|
|
# 启动容器(挂载模型目录,增加内存限制)
|
|
|
|
|
|
docker run -d \
|
|
|
|
|
|
--name tts-service \
|
|
|
|
|
|
-p 8000:8000 \
|
|
|
|
|
|
-v ~/Qwen3-TTS/model:/app/model:ro \
|
|
|
|
|
|
--memory="8g" \
|
|
|
|
|
|
--restart unless-stopped \
|
|
|
|
|
|
qwen3-tts:latest
|
|
|
|
|
|
|
|
|
|
|
|
# 等待服务启动(1.7B 模型需要更长时间,约15-30秒)
|
|
|
|
|
|
sleep 20
|
|
|
|
|
|
|
|
|
|
|
|
# 验证服务
|
|
|
|
|
|
curl http://localhost:8000/
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**预期输出:**
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"status": "running",
|
|
|
|
|
|
"mode": "production",
|
|
|
|
|
|
"model": "Qwen3-TTS-24Hz-1.7B-Base-VoiceClone",
|
|
|
|
|
|
"features": ["voice_clone", "high_quality", "24khz"]
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 切换到其他模型版本
|
|
|
|
|
|
|
|
|
|
|
|
### 切换到性能优先模型(12Hz-0.6B-CustomVoice)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 1. 下载新模型
|
|
|
|
|
|
cd ~/Qwen3-TTS/model
|
|
|
|
|
|
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
|
|
|
|
|
|
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./Qwen3-TTS-Tokenizer-12Hz
|
|
|
|
|
|
|
|
|
|
|
|
# 2. 修改 app.py 中的 MODEL_PATH 和 TOKENIZER_PATH
|
|
|
|
|
|
# MODEL_PATH = '/app/model/Qwen3-TTS-12Hz-0.6B-CustomVoice'
|
|
|
|
|
|
# TOKENIZER_PATH = '/app/model/Qwen3-TTS-Tokenizer-12Hz'
|
|
|
|
|
|
|
|
|
|
|
|
# 3. 重新构建并启动
|
|
|
|
|
|
cd ~/Qwen3-TTS
|
|
|
|
|
|
docker stop tts-service
|
|
|
|
|
|
docker build -t qwen3-tts:v12hz .
|
|
|
|
|
|
docker run -d --name tts-service -p 8000:8000 -v ~/Qwen3-TTS/model:/app/model:ro qwen3-tts:v12hz
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 切换到最高音质模型(24Hz-1.7B-CustomVoice,无声音克隆)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 1. 下载新模型
|
|
|
|
|
|
cd ~/Qwen3-TTS/model
|
|
|
|
|
|
modelscope download --model Qwen/Qwen3-TTS-24Hz-1.7B-CustomVoice --local_dir ./Qwen3-TTS-24Hz-1.7B-CustomVoice
|
|
|
|
|
|
|
|
|
|
|
|
# 2. 修改 app.py 中的路径
|
|
|
|
|
|
# MODEL_PATH = '/app/model/Qwen3-TTS-24Hz-1.7B-CustomVoice'
|
|
|
|
|
|
# TOKENIZER_PATH = '/app/model/Qwen3-TTS-Tokenizer-24Hz'
|
|
|
|
|
|
|
|
|
|
|
|
# 3. 重新构建并启动
|
|
|
|
|
|
cd ~/Qwen3-TTS
|
|
|
|
|
|
docker stop tts-service
|
|
|
|
|
|
docker build -t qwen3-tts:v24hz-noclone .
|
|
|
|
|
|
docker run -d --name tts-service -p 8000:8000 -v ~/Qwen3-TTS/model:/app/model:ro qwen3-tts:v24hz-noclone
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## API 使用
|
|
|
|
|
|
|
|
|
|
|
|
### 健康检查
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
curl http://localhost:8000/
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 文本转语音
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
curl -X POST http://localhost:8000/tts \
|
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
|
-d '"你好,这是一个测试"'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**注意:** Body 必须是 JSON 字符串格式,不能是 JSON 对象 `{"text": "..."}`
|
|
|
|
|
|
|
|
|
|
|
|
**响应示例:**
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"code": 0,
|
|
|
|
|
|
"msg": "success",
|
|
|
|
|
|
"text": "你好,这是一个测试",
|
|
|
|
|
|
"audio": "UklGRiTCAQBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAAZGF0YQD..."
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 长文本处理
|
|
|
|
|
|
|
|
|
|
|
|
**重要提示:**
|
|
|
|
|
|
- 1.7B 模型推理速度较慢(约 1-3 秒/字符)
|
|
|
|
|
|
- 短文本(< 20 字):约 15-30 秒
|
|
|
|
|
|
- 中等文本(20-50 字):约 50-120 秒
|
|
|
|
|
|
- 长文本(50-100 字):约 120-240 秒
|
|
|
|
|
|
|
|
|
|
|
|
**客户端必须设置超时时间 >= 180 秒**,否则会收到 `EOF` 错误。
|
|
|
|
|
|
|
|
|
|
|
|
### Go 调用示例(带超时设置)
|
|
|
|
|
|
|
|
|
|
|
|
```go
|
|
|
|
|
|
package main
|
|
|
|
|
|
|
|
|
|
|
|
import (
|
|
|
|
|
|
"bytes"
|
|
|
|
|
|
"encoding/base64"
|
|
|
|
|
|
"encoding/json"
|
|
|
|
|
|
"fmt"
|
|
|
|
|
|
"net/http"
|
|
|
|
|
|
"os"
|
|
|
|
|
|
"time"
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
type TTSResponse struct {
|
|
|
|
|
|
Code int `json:"code"`
|
|
|
|
|
|
Msg string `json:"msg"`
|
|
|
|
|
|
Text string `json:"text"`
|
|
|
|
|
|
Audio string `json:"audio"`
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
func TTS(text string) ([]byte, error) {
|
|
|
|
|
|
// 必须使用 JSON 字符串格式:`"文本内容"`
|
|
|
|
|
|
jsonText := fmt.Sprintf(`"%s"`, text)
|
|
|
|
|
|
|
|
|
|
|
|
// 创建带超时的 HTTP 客户端(180秒超时,1.7B 模型需要更长时间)
|
|
|
|
|
|
client := &http.Client{
|
|
|
|
|
|
Timeout: 180 * time.Second,
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
resp, err := client.Post(
|
|
|
|
|
|
"http://localhost:8000/tts",
|
|
|
|
|
|
"application/json",
|
|
|
|
|
|
bytes.NewBufferString(jsonText),
|
|
|
|
|
|
)
|
|
|
|
|
|
if err != nil {
|
|
|
|
|
|
return nil, err
|
|
|
|
|
|
}
|
|
|
|
|
|
defer resp.Body.Close()
|
|
|
|
|
|
|
|
|
|
|
|
var result TTSResponse
|
|
|
|
|
|
if err := json.NewDecoder(resp.Body).Decode(&result); err != nil {
|
|
|
|
|
|
return nil, err
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
if result.Code != 0 {
|
|
|
|
|
|
return nil, fmt.Errorf("TTS error: %s", result.Msg)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
return base64.StdEncoding.DecodeString(result.Audio)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
func main() {
|
|
|
|
|
|
// 长文本测试
|
|
|
|
|
|
longText := "欢迎使用红动未来数字人服务平台,我们将为您提供最优质的AI数字人解决方案。人工智能技术正在改变我们的生活,让我们一起探索未来的无限可能。"
|
|
|
|
|
|
|
|
|
|
|
|
audio, err := TTS(longText)
|
|
|
|
|
|
if err != nil {
|
|
|
|
|
|
panic(err)
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
os.WriteFile("output.wav", audio, 0644)
|
|
|
|
|
|
fmt.Println("音频已保存到 output.wav")
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Python 调用示例(带超时设置)
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
import requests
|
|
|
|
|
|
import base64
|
|
|
|
|
|
|
|
|
|
|
|
def tts(text, timeout=180):
|
|
|
|
|
|
"""
|
|
|
|
|
|
TTS 文本转语音
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
text: 要转换的文本
|
|
|
|
|
|
timeout: 超时时间(秒),1.7B 模型建议 >= 180 秒
|
|
|
|
|
|
"""
|
|
|
|
|
|
# 必须使用 JSON 字符串格式
|
|
|
|
|
|
response = requests.post(
|
|
|
|
|
|
"http://localhost:8000/tts",
|
|
|
|
|
|
data=f'"{text}"',
|
|
|
|
|
|
headers={"Content-Type": "application/json"},
|
|
|
|
|
|
timeout=timeout # 设置超时
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
result = response.json()
|
|
|
|
|
|
if result["code"] != 0:
|
|
|
|
|
|
raise Exception(f"TTS error: {result['msg']}")
|
|
|
|
|
|
|
|
|
|
|
|
return base64.b64decode(result["audio"])
|
|
|
|
|
|
|
|
|
|
|
|
# 使用
|
|
|
|
|
|
short_text = "你好,这是一个测试"
|
|
|
|
|
|
long_text = "欢迎使用红动未来数字人服务平台,我们将为您提供最优质的AI数字人解决方案。"
|
|
|
|
|
|
|
|
|
|
|
|
# 短文本(30秒超时)
|
|
|
|
|
|
audio_data = tts(short_text, timeout=30)
|
|
|
|
|
|
with open("short_output.wav", "wb") as f:
|
|
|
|
|
|
f.write(audio_data)
|
|
|
|
|
|
|
|
|
|
|
|
# 长文本(180秒超时)
|
|
|
|
|
|
audio_data = tts(long_text, timeout=180)
|
|
|
|
|
|
with open("long_output.wav", "wb") as f:
|
|
|
|
|
|
f.write(audio_data)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 服务管理
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 查看日志
|
|
|
|
|
|
docker logs -f tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 查看最近50行日志
|
|
|
|
|
|
docker logs --tail 50 tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 停止服务
|
|
|
|
|
|
docker stop tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 启动服务
|
|
|
|
|
|
docker start tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 重启服务
|
|
|
|
|
|
docker restart tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 删除容器
|
|
|
|
|
|
docker stop tts-service && docker rm tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 删除镜像
|
|
|
|
|
|
docker rmi qwen3-tts:latest
|
|
|
|
|
|
|
|
|
|
|
|
# 进入容器
|
|
|
|
|
|
docker exec -it tts-service /bin/bash
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 故障排查
|
|
|
|
|
|
|
|
|
|
|
|
### 1. 端口被占用
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 查找占用 8000 端口的进程
|
|
|
|
|
|
lsof -ti:8000
|
|
|
|
|
|
|
|
|
|
|
|
# 停止占用端口的进程
|
|
|
|
|
|
lsof -ti:8000 | xargs kill -9
|
|
|
|
|
|
|
|
|
|
|
|
# 或修改映射端口
|
|
|
|
|
|
docker run -d -p 8001:8000 --name tts-service qwen3-tts:latest
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2. API 返回 422 错误
|
|
|
|
|
|
|
|
|
|
|
|
**原因:** 请求格式不正确,必须使用 JSON 字符串格式
|
|
|
|
|
|
|
|
|
|
|
|
**正确请求:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
curl -X POST http://localhost:8000/tts \
|
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
|
-d '"你好"'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**错误请求:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# ❌ 错误:这是 JSON 对象,不是字符串
|
|
|
|
|
|
curl -X POST http://localhost:8000/tts \
|
|
|
|
|
|
-d '{"text": "你好"}'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 3. 长文本返回 EOF 错误
|
|
|
|
|
|
|
|
|
|
|
|
**原因:** 1.7B 模型推理更慢,长文本处理时间超过客户端超时时间
|
|
|
|
|
|
|
|
|
|
|
|
**解决方案:**
|
|
|
|
|
|
1. **客户端设置超时 >= 180 秒**
|
|
|
|
|
|
2. **缩短文本长度**(建议单次请求 < 100 字)
|
|
|
|
|
|
3. **使用 GPU 加速**(如果可用)
|
|
|
|
|
|
4. **切换到 0.6B 模型**(更快但质量略低)
|
|
|
|
|
|
|
|
|
|
|
|
**Go 客户端:**
|
|
|
|
|
|
```go
|
|
|
|
|
|
client := &http.Client{
|
|
|
|
|
|
Timeout: 180 * time.Second, // 设置 180 秒超时
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Python 客户端:**
|
|
|
|
|
|
```python
|
|
|
|
|
|
response = requests.post(
|
|
|
|
|
|
"http://localhost:8000/tts",
|
|
|
|
|
|
data=f'"{text}"',
|
|
|
|
|
|
timeout=180 # 设置 180 秒超时
|
|
|
|
|
|
)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 4. 音频无声音或文件过小
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 查看日志检查模型是否加载
|
|
|
|
|
|
docker logs tts-service | grep "TTS 模型初始化完成"
|
|
|
|
|
|
|
|
|
|
|
|
# 检查模型文件
|
|
|
|
|
|
docker exec tts-service ls -la /app/model/
|
|
|
|
|
|
|
|
|
|
|
|
# 测试 API 返回的音频数据大小(应该 > 10KB)
|
|
|
|
|
|
curl -s http://localhost:8000/tts -d '"测试"' | python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d['audio']))"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 5. 内存不足
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 增加内存限制(1.7B 模型建议 >= 8GB)
|
|
|
|
|
|
docker run -d --name tts-service -p 8000:8000 --memory="8g" qwen3-tts:latest
|
|
|
|
|
|
|
|
|
|
|
|
# 或增加更多内存
|
|
|
|
|
|
docker run -d --name tts-service -p 8000:8000 --memory="12g" qwen3-tts:latest
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 6. 服务启动后无法访问
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 检查容器状态
|
|
|
|
|
|
docker ps | grep tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 检查端口映射
|
|
|
|
|
|
docker port tts-service
|
|
|
|
|
|
|
|
|
|
|
|
# 检查服务是否正常响应
|
|
|
|
|
|
curl http://localhost:8000/
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 7. 推理速度过慢
|
|
|
|
|
|
|
|
|
|
|
|
**1.7B 模型优化方案:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 限制并发为 1(避免 CPU 争抢)
|
|
|
|
|
|
docker run -d --name tts-service -p 8000:8000 qwen3-tts:latest \
|
|
|
|
|
|
uvicorn app:app --limit-concurrency 1
|
|
|
|
|
|
|
|
|
|
|
|
# 增加 CPU 资源
|
|
|
|
|
|
docker run -d --name tts-service -p 8000:8000 --cpus="8.0" qwen3-tts:latest
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**GPU 加速(需要 NVIDIA GPU):**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 修改 app.py 中的 device_map='cpu' 为 device_map='cuda:0'
|
|
|
|
|
|
# 重新构建镜像并运行
|
|
|
|
|
|
docker run -d --name tts-service --gpus all -p 8000:8000 qwen3-tts:latest
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**切换到更快的模型:**
|
|
|
|
|
|
- 如果对速度要求高,可切换到 `Qwen3-TTS-12Hz-0.6B-CustomVoice`
|
|
|
|
|
|
- 推理速度可提升 3-5 倍
|
|
|
|
|
|
|
|
|
|
|
|
### 8. 模型加载失败
|
|
|
|
|
|
|
|
|
|
|
|
**检查模型路径:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 查看容器内模型目录
|
|
|
|
|
|
docker exec tts-service ls -la /app/model/
|
|
|
|
|
|
|
|
|
|
|
|
# 确认 app.py 中的 MODEL_PATH 和 TOKENIZER_PATH 正确
|
|
|
|
|
|
docker exec tts-service cat /app/app.py | grep "MODEL_PATH\|TOKENIZER_PATH"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 常见问题 FAQ
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 为什么选择 Qwen3-TTS-24Hz-1.7B-Base-VoiceClone?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 这是 Qwen3-TTS 系列中功能最全面、音质最高的模型:
|
|
|
|
|
|
- **24kHz 采样率**:双倍于 12Hz 模型,音质更清晰自然
|
|
|
|
|
|
- **1.7B 参数**:模型表达能力最强
|
|
|
|
|
|
- **声音克隆**:支持自定义声音训练和生成
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 1.7B 模型推理速度慢怎么办?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 可以采取以下措施:
|
|
|
|
|
|
1. 客户端设置超时 >= 180 秒
|
|
|
|
|
|
2. 使用 GPU 加速(速度提升 10-20 倍)
|
|
|
|
|
|
3. 切换到 0.6B 模型(速度提升 3-5 倍)
|
|
|
|
|
|
4. 缩短单次请求文本长度
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 为什么 Body 必须是 JSON 字符串而不是 JSON 对象?**
|
|
|
|
|
|
|
|
|
|
|
|
A: FastAPI 使用 `Body(..., media_type='application/json')` 解析时,直接接收 JSON 字符串。如果使用 `{"text": "..."}` 格式,需要修改 `app.py` 使用 Pydantic 模型。当前实现更简洁,直接传递字符串即可。
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 24Hz 和 12Hz 模型有什么区别?**
|
|
|
|
|
|
|
|
|
|
|
|
A:
|
|
|
|
|
|
- **24Hz**:采样率 24kHz,音质更清晰自然,适合高质量需求
|
|
|
|
|
|
- **12Hz**:采样率 12kHz,推理速度快,适合实时应用
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 1.7B 和 0.6B 模型有什么区别?**
|
|
|
|
|
|
|
|
|
|
|
|
A:
|
|
|
|
|
|
- **1.7B**:参数量更大,音质更高,但推理速度慢,内存占用大(推荐 GPU)
|
|
|
|
|
|
- **0.6B**:参数量小,推理快,内存占用少,适合 CPU 环境
|
|
|
|
|
|
|
|
|
|
|
|
**Q: CustomVoice 和 Base 模型有什么区别?**
|
|
|
|
|
|
|
|
|
|
|
|
A:
|
|
|
|
|
|
- **CustomVoice**:内置 9 种预设声音,开箱即用
|
|
|
|
|
|
- **Base**:基础模型,支持自定义训练和声音克隆
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 长文本推理需要多长时间?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 1.7B 模型 CPU 推理速度约 1-3 秒/字符:
|
|
|
|
|
|
- 短文本(< 20 字):约 15-30 秒
|
|
|
|
|
|
- 中等文本(20-50 字):约 50-120 秒
|
|
|
|
|
|
- 长文本(50-100 字):约 120-240 秒
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 为什么长文本会返回 EOF 错误?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 1.7B 模型推理时间长,如果客户端超时时间设置过短会断开连接。解决方案:
|
|
|
|
|
|
1. 客户端设置超时 >= 180 秒
|
|
|
|
|
|
2. 缩短单次请求文本长度
|
|
|
|
|
|
3. 使用 GPU 加速
|
|
|
|
|
|
4. 切换到 0.6B 模型
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 支持哪些声音?**
|
|
|
|
|
|
|
|
|
|
|
|
A: CustomVoice 模型支持 9 种预设声音:serena, vivian, uncle_fu, ryan, aiden, ono_anna, sohee, eric, dylan
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 可以自定义声音吗?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 可以,Base-VoiceClone 模型支持声音克隆功能,详见 Qwen3-TTS 官方文档
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 支持哪些语言?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 中文、英文、日语、韩语、德语、法语、俄语、葡萄牙语、西班牙语、意大利语
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 音频采样率是多少?**
|
|
|
|
|
|
|
|
|
|
|
|
A: 24kHz (24000 Hz) - 比标准 CD 音质(44.1kHz)略低,但比 12Hz 模型清晰很多
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 生成的音频文件格式是什么?**
|
|
|
|
|
|
|
|
|
|
|
|
A: WAV 格式,Microsoft PCM, 16 bit, mono
|
|
|
|
|
|
|
|
|
|
|
|
**Q: 如何提高推理速度?**
|
|
|
|
|
|
|
|
|
|
|
|
A:
|
|
|
|
|
|
1. 使用 GPU 加速(device_map='cuda:0')
|
|
|
|
|
|
2. 使用更小的模型(0.6B 而非 1.7B)
|
|
|
|
|
|
3. 使用 12Hz 模型而非 24Hz 模型
|
|
|
|
|
|
4. 限制并发请求(limit-concurrency=1)
|
|
|
|
|
|
5. 增加 CPU 核心数(--cpus="8.0")
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 目录结构
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
~/Qwen3-TTS/
|
|
|
|
|
|
├── app.py # FastAPI 服务代码
|
|
|
|
|
|
├── Dockerfile # Docker 镜像构建文件
|
|
|
|
|
|
├── requirements.txt # Python 依赖
|
|
|
|
|
|
└── model/ # 模型文件目录
|
|
|
|
|
|
├── Qwen3-TTS-Tokenizer-24Hz/ # 24Hz 分词器(651MB)
|
|
|
|
|
|
└── Qwen3-TTS-24Hz-1.7B-Base-VoiceClone/ # 24Hz 1.7B 模型(6.8GB)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 相关链接
|
|
|
|
|
|
|
|
|
|
|
|
- 官方文档:https://github.com/QwenLM/Qwen3-TTS
|
|
|
|
|
|
- ModelScope:https://modelscope.cn/models?name=Qwen3-TTS
|
|
|
|
|
|
- FastAPI 文档:https://fastapi.tiangolo.com/
|