概述

WhatsApp Voice Talk

Turn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: voice → transcription → intent detection → response generation → text-to-speech.

Perfect for:

Voice assistants on WhatsApp
Hands-free command interfaces
Multi-lingual chatbots
IoT voice control (drones, smart home, etc.)

Quick Start

1. Install Dependencies

pip install openai-whisper soundfile numpy

2. Process a Voice Message

const { processVoiceNote } = require('./scripts/voice-processor');
const fs = require('fs');

// Read a voice message (OGG, WAV, MP3, etc.)
const buffer = fs.readFileSync('voice-message.ogg');

// Process it
const result = await processVoiceNote(buffer);

console.log(result);
// {
//   status: 'success',
//   response: "Current weather in Delhi is 19°C, haze. Humidity is 56%.",
//   transcript: "What's the weather today?",
//   intent: 'weather',
//   language: 'en',
//   timestamp: 1769860205186
// }

3. Run Auto-Listener

For automatic processing of incoming WhatsApp voice messages:

node scripts/voice-listener-daemon.js

This watches ~/.clawdbot/media/inbound/ every 5 seconds and processes new voice files.

How It Works

Incoming Voice Message
        ↓
    Transcribe (Whisper API)
        ↓
  "What's the weather?"
        ↓
  Detect Language & Intent
        ↓
   Match against INTENTS
        ↓
   Execute Handler
        ↓
   Generate Response
        ↓
   Convert to TTS
        ↓
  Send back via WhatsApp

Key Features

✅ Zero Setup Complexity - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.

✅ Multi-Language - Automatic English/Hindi detection. Extend easily.

✅ Intent-Driven - Define custom intents with keywords and handlers.

✅ Real-Time Processing - 5-10 seconds per message (after first model load).

✅ Customizable - Add weather, status, commands, or anything else.

✅ Production Ready - Built from real usage in Clawdbot.

Common Use Cases

Weather Bot

// User says: "What's the weather in Bangalore?"
// Response: "Current weather in Delhi is 19°C..."

// (Built-in intent, just enable it)

Smart Home Control

// User says: "Turn on the lights"
// Handler: Sends signal to smart home API
// Response: "Lights turned on"

Task Manager

// User says: "Add milk to shopping list"
// Handler: Adds to database
// Response: "Added milk to your list"

Status Checker

// User says: "Is the system running?"
// Handler: Checks system status
// Response: "All systems online"

Customization

Add a Custom Intent

Edit voice-processor.js:

Add to INTENTS map:

const INTENTS = {
  'shopping': {
    keywords: ['shopping', 'list', 'buy', 'खरीद'],
    handler: 'handleShopping'
  }
};

Add handler:

const handlers = {
  async handleShopping(language = 'en') {
    return {
      status: 'success',
      response: language === 'en' 
        ? "What would you like to add to your shopping list?"
        : "आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?"
    };
  }
};

Support More Languages

Update detectLanguage() for your language's Unicode:

const urduChars = /[\u0600-\u06FF]/g; // Add this

Add language code to returns:

return language === 'ur' ? 'Urdu response' : 'English response';

Set language in transcribe.py:

result = model.transcribe(data, language="ur")

Change Transcription Model

In transcribe.py:

model = whisper.load_model("tiny")    # Fastest, 39MB
model = whisper.load_model("base")    # Default, 140MB  
model = whisper.load_model("small")   # Better, 466MB
model = whisper.load_model("medium")  # Good, 1.5GB

Architecture

Scripts:

transcribe.py - Whisper transcription (Python)
voice-processor.js - Core logic (intent parsing, handlers)
voice-listener-daemon.js - Auto-listener watching for new messages

References:

SETUP.md - Installation and configuration
API.md - Detailed function documentation

Integration with Clawdbot

If running as a Clawdbot skill, hook into message events:

// In your Clawdbot handler
const { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');

message.on('voice', async (audioBuffer) => {
  const result = await processVoiceNote(audioBuffer, message.from);
  
  // Send response back
  await message.reply(result.response);
  
  // Or send as voice (requires TTS)
  await sendVoiceMessage(result.response);
});

Performance

First run: ~30 seconds (downloads Whisper model, ~140MB)
Typical: 5-10 seconds per message
Memory: ~1.5GB (base model)
Languages: English, Hindi (easily extended)

Supported Audio Formats

OGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.

WhatsApp uses Opus-coded OGG by default — works out of the box.

Troubleshooting

"No module named 'whisper'"

pip install openai-whisper

"No module named 'soundfile'"

pip install soundfile

Voice messages not processing?

Check: clawdbot status (is it running?)
Check: ~/.clawdbot/media/inbound/ (files arriving?)
Run daemon manually: node scripts/voice-listener-daemon.js (see logs)

Slow transcription?

Use smaller model: whisper.load_model("base") or "tiny"

License

MIT - Use freely, customize, contribute back!

Built for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.

版本历史

共 1 个版本

v1.0.0 当前

2026-03-28 14:35 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)