← 返回
AI智能

Local TTS

Local text-to-speech using Qwen3-TTS with mlx_audio (macOS Apple Silicon) or qwen-tts (Linux/Windows). Privacy-first offline TTS with natural, realistic voic...
本地文字转语音,基于 Qwen3-TTS(macOS Apple Silicon 用 mlx_audio,Linux/Windows 用 qwen-tts)。隐私优先的离线 TTS,输出自然逼真的语音。
irachex
AI智能 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 637
下载
💾 19
安装
1
版本
#latest

概述

Local TTS with Qwen3-TTS

Privacy-First | Offline | High-Quality | Natural Real Voices

Local text-to-speech synthesis using Qwen3-TTS models. Your text never leaves your machine.

Why Local TTS?

Unlike cloud TTS (Google, AWS, Azure), local-tts ensures:

  • Zero data transmission - 100% on-device processing
  • Works offline - No network required
  • No API keys - No external dependencies
  • GDPR/HIPAA friendly - Simplified compliance

See privacy & security details.

Platform Overview

PlatformBackendInstallationBest For
-------------------------------------------
macOS (Apple Silicon)mlx_audiopip install mlx-audioM1/M2/M3/M4 Macs
Linux/Windowsqwen-ttspip install qwen-ttsCUDA GPUs

Quick Start

macOS

pip install mlx-audio
brew install ffmpeg

# Natural female voice
python -m mlx_audio.tts.generate \
    --text "Hello world" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Chelsie

Linux/Windows

pip install qwen-tts

# With optimizations (FlashAttention, bfloat16, auto-device)
python scripts/tts_linux.py "Hello world" --female

Key Concepts

--voice vs --instruct (Important)

Model--voice--instructNotes
---------------------------------------
CustomVoiceSelect preset voiceAdd style/emotionCan use together - voice + style control
VoiceDesignN/ACreate voice from description--instruct only
BaseN/AN/AFor voice cloning with --ref_audio

CustomVoice with style control:

python -m mlx_audio.tts.generate \
    --text "Hello there!" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Serena \
    --instruct "excited and enthusiastic"

9 Preset Voices (Open Source CustomVoice)

VoiceGenderLanguageCharacter
------------------------------------
ChelsieFemaleEnglish (American)Gentle, empathetic
SerenaFemaleEnglishWarm, gentle
Ono AnnaFemaleJapanesePlayful
SoheeFemaleKoreanWarm
AidenMaleEnglish (American)Sunny
DylanMaleEnglishNatural
EricMaleEnglishReal
RyanMaleEnglishNatural
Uncle FuMaleChineseYouthful Beijing

Defaults: Female=Serena, Male=Aiden

Usage Examples

CustomVoice (Preset Voices)

# Natural female
python -m mlx_audio.tts.generate \
    --text "Your text" --voice Serena --lang_code en \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

# Real male
python -m mlx_audio.tts.generate \
    --text "Your text" --voice Aiden --lang_code en \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit

VoiceDesign (Text-Based)

python -m mlx_audio.tts.generate \
    --text "Hello" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-VoiceDesign-8bit \
    --instruct "A warm female voice, professional and clear"

Long Text Generation

For long text, increase --max_tokens and enable --join_audio (macOS/MLX only):

python -m mlx_audio.tts.generate \
    --text "Your very long text here..." \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-CustomVoice-8bit \
    --voice Serena \
    --max_tokens 4096 \
    --join_audio \
    --output long_audio.wav

Voice Cloning

python -m mlx_audio.tts.generate \
    --text "Cloned voice speaking" \
    --model mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit \
    --ref_audio sample.wav --ref_text "Sample transcript"

Parameters

ParameterDescriptionValues
--------------------------------
--textText to speakRequired
--modelModel IDSee table below
--voicePreset voice (CustomVoice)Chelsie, Serena, Aiden, Ryan...
--instructVoice description (VoiceDesign) or style/emotion (CustomVoice)e.g., "excited", "calm", "professional"
--speedSpeaking rate0.5-2.0 (default: 1.0)
--pitchVoice pitch0.5-2.0 (default: 1.0)
--lang_codeLanguageen, cn, ja, ko, de, fr...
--ref_audioReference for cloningFile path
--outputOutput filePath (auto-generated if omitted)
--max_tokensMax generation tokensInteger (default: 2048) - Increase for long text
--join_audioMerge audio segmentstrue (default) or false - Recommended for long text

Models

ModelSizePurpose
----------------------
Qwen3-TTS-12Hz-1.7B-CustomVoice1.7B9 preset voices + style control
Qwen3-TTS-12Hz-1.7B-VoiceDesign1.7BText-based voice creation
Qwen3-TTS-12Hz-1.7B-Base1.7BVoice cloning
Qwen3-TTS-12Hz-0.6B-*0.6BLightweight versions

macOS: Add mlx-community/ prefix (e.g., mlx-community/Qwen3-TTS-12Hz-1.7B-Base-8bit)

Scripts

  • scripts/tts_macos.py - macOS wrapper
  • scripts/tts_linux.py - Linux/Windows wrapper with optimizations

Optimizations (Linux/Windows)

tts_linux.py automatically enables:

  • FlashAttention - Faster, less memory
  • bfloat16 - Better precision
  • Auto device - CUDA → CPU fallback
  • Mixed precision - Speed + quality

Troubleshooting

IssueSolution
-----------------
macOS: Model not foundUse mlx-community/ prefix
macOS: Audio formatbrew install ffmpeg
Linux: CUDA OOMUse 0.6B models
Linux: SlowCheck CUDA: torch.cuda.is_available()

References

Version

1.0.0 - See VERSION and package.json

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 17:17 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,055 📥 795,910
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 833 📥 212,777
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,697