← 返回
AI智能

Pocket Tts

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.
利用 Kyutai 的 Pocket TTS 模型在 CPU 上离线生成高质量英语语音,支持 8 种内置音色或自定义声音克隆。
sherajdev
AI智能 clawhub v1.0.1 1 版本 99925.7 Key: 无需
★ 3
Stars
📥 2,630
下载
💾 323
安装
1
版本
#audio#latest#local#offline#text-to-speech#tts

概述

Pocket TTS Skill

Fully local, offline text-to-speech using Kyutai's Pocket TTS model. Generate high-quality audio from text without any API calls or internet connection. Features 8 built-in voices, voice cloning support, and runs entirely on CPU.

Features

  • 🎯 Fully local - No API calls, runs completely offline
  • 🚀 CPU-only - No GPU required, works on any computer
  • Fast generation - ~2-6x real-time on CPU
  • 🎤 8 built-in voices - alba, marius, javert, jean, fantine, cosette, eponine, azelma
  • 🎭 Voice cloning - Clone any voice from a WAV sample
  • 🔊 Low latency - ~200ms first audio chunk
  • 📚 Simple Python API - Easy integration into any project

Installation

# 1. Accept the model license on Hugging Face
# https://huggingface.co/kyutai/pocket-tts

# 2. Install the package
pip install pocket-tts

# Or use uv for automatic dependency management
uvx pocket-tts generate "Hello world"

Usage

CLI

# Basic usage
pocket-tts "Hello, I am your AI assistant"

# With specific voice
pocket-tts "Hello" --voice alba --output hello.wav

# With custom voice file (voice cloning)
pocket-tts "Hello" --voice-file myvoice.wav --output output.wav

# Adjust speed
pocket-tts "Hello" --speed 1.2

# Start local server
pocket-tts --serve

# List available voices
pocket-tts --list-voices

Python API

from pocket_tts import TTSModel
import scipy.io.wavfile

# Load model
tts_model = TTSModel.load_model()

# Get voice state
voice_state = tts_model.get_state_for_audio_prompt(
    "hf://kyutai/tts-voices/alba-mackenna/casual.wav"
)

# Generate audio
audio = tts_model.generate_audio(voice_state, "Hello world!")

# Save to WAV
scipy.io.wavfile.write("output.wav", tts_model.sample_rate, audio.numpy())

# Check sample rate
print(f"Sample rate: {tts_model.sample_rate} Hz")

Available Voices

VoiceDescription
--------------------
albaCasual female voice
mariusMale voice
javertClear male voice
jeanNatural male voice
fantineFemale voice
cosetteFemale voice
eponineFemale voice
azelmaFemale voice

Or use --voice-file /path/to/wav.wav for custom voice cloning.

Options

OptionDescriptionDefault
------------------------------
textText to convertRequired
-o, --outputOutput WAV fileoutput.wav
-v, --voiceVoice presetalba
-s, --speedSpeech speed (0.5-2.0)1.0
--voice-fileCustom WAV for cloningNone
--serveStart HTTP serverFalse
--list-voicesList all voicesFalse

Requirements

  • Python 3.10-3.14
  • PyTorch 2.5+ (CPU version works)
  • Works on 2 CPU cores

Notes

  • ⚠️ Model is gated - accept license on Hugging Face first
  • 🌍 English language only (v1)
  • 💾 First run downloads model (~100M parameters)
  • 🔊 Audio is returned as 1D torch tensor (PCM data)

Links

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-28 13:58 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,527
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,055 📥 795,910
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,697