← 返回
沟通协作 中文

whatsappVoiceOpenSkill

Real-time WhatsApp voice message processing. Transcribe voice notes to text via Whisper, detect intent, execute handlers, and send responses. Use when building conversational voice interfaces for WhatsApp. Supports English and Hindi, customizable intents (weather, status, commands), automatic language detection, and streaming responses via TTS.
实时 WhatsApp 语音消息处理。使用 Whisper 将语音转为文字,检测意图并执行处理器,返回响应。适用于构建 WhatsApp 对话语音接口。支持英语和印地语,可自定义意图(天气、状态、指令),自动语言检测以及通过 TTS 流式响应。
syedateebulislam
沟通协作 clawhub v1.0.0 1 版本 99825.7 Key: 无需
★ 0
Stars
📥 2,863
下载
💾 5
安装
1
版本
#latest

概述

WhatsApp Voice Talk

Turn WhatsApp voice messages into real-time conversations. This skill provides a complete pipeline: voice → transcription → intent detection → response generation → text-to-speech.

Perfect for:

  • Voice assistants on WhatsApp
  • Hands-free command interfaces
  • Multi-lingual chatbots
  • IoT voice control (drones, smart home, etc.)

Quick Start

1. Install Dependencies

pip install openai-whisper soundfile numpy

2. Process a Voice Message

const { processVoiceNote } = require('./scripts/voice-processor');
const fs = require('fs');

// Read a voice message (OGG, WAV, MP3, etc.)
const buffer = fs.readFileSync('voice-message.ogg');

// Process it
const result = await processVoiceNote(buffer);

console.log(result);
// {
//   status: 'success',
//   response: "Current weather in Delhi is 19°C, haze. Humidity is 56%.",
//   transcript: "What's the weather today?",
//   intent: 'weather',
//   language: 'en',
//   timestamp: 1769860205186
// }

3. Run Auto-Listener

For automatic processing of incoming WhatsApp voice messages:

node scripts/voice-listener-daemon.js

This watches ~/.clawdbot/media/inbound/ every 5 seconds and processes new voice files.

How It Works

Incoming Voice Message
        ↓
    Transcribe (Whisper API)
        ↓
  "What's the weather?"
        ↓
  Detect Language & Intent
        ↓
   Match against INTENTS
        ↓
   Execute Handler
        ↓
   Generate Response
        ↓
   Convert to TTS
        ↓
  Send back via WhatsApp

Key Features

Zero Setup Complexity - No FFmpeg, no complex dependencies. Uses soundfile + Whisper.

Multi-Language - Automatic English/Hindi detection. Extend easily.

Intent-Driven - Define custom intents with keywords and handlers.

Real-Time Processing - 5-10 seconds per message (after first model load).

Customizable - Add weather, status, commands, or anything else.

Production Ready - Built from real usage in Clawdbot.

Common Use Cases

Weather Bot

// User says: "What's the weather in Bangalore?"
// Response: "Current weather in Delhi is 19°C..."

// (Built-in intent, just enable it)

Smart Home Control

// User says: "Turn on the lights"
// Handler: Sends signal to smart home API
// Response: "Lights turned on"

Task Manager

// User says: "Add milk to shopping list"
// Handler: Adds to database
// Response: "Added milk to your list"

Status Checker

// User says: "Is the system running?"
// Handler: Checks system status
// Response: "All systems online"

Customization

Add a Custom Intent

Edit voice-processor.js:

  1. Add to INTENTS map:
  2. const INTENTS = {
      'shopping': {
        keywords: ['shopping', 'list', 'buy', 'खरीद'],
        handler: 'handleShopping'
      }
    };
    
  1. Add handler:
  2. const handlers = {
      async handleShopping(language = 'en') {
        return {
          status: 'success',
          response: language === 'en' 
            ? "What would you like to add to your shopping list?"
            : "आप अपनी शॉपिंग लिस्ट में क्या जोड़ना चाहते हैं?"
        };
      }
    };
    

Support More Languages

  1. Update detectLanguage() for your language's Unicode:
  2. const urduChars = /[\u0600-\u06FF]/g; // Add this
    
  1. Add language code to returns:
  2. return language === 'ur' ? 'Urdu response' : 'English response';
    
  1. Set language in transcribe.py:
  2. result = model.transcribe(data, language="ur")
    

Change Transcription Model

In transcribe.py:

model = whisper.load_model("tiny")    # Fastest, 39MB
model = whisper.load_model("base")    # Default, 140MB  
model = whisper.load_model("small")   # Better, 466MB
model = whisper.load_model("medium")  # Good, 1.5GB

Architecture

Scripts:

  • transcribe.py - Whisper transcription (Python)
  • voice-processor.js - Core logic (intent parsing, handlers)
  • voice-listener-daemon.js - Auto-listener watching for new messages

References:

  • SETUP.md - Installation and configuration
  • API.md - Detailed function documentation

Integration with Clawdbot

If running as a Clawdbot skill, hook into message events:

// In your Clawdbot handler
const { processVoiceNote } = require('skills/whatsapp-voice-talk/scripts/voice-processor');

message.on('voice', async (audioBuffer) => {
  const result = await processVoiceNote(audioBuffer, message.from);
  
  // Send response back
  await message.reply(result.response);
  
  // Or send as voice (requires TTS)
  await sendVoiceMessage(result.response);
});

Performance

  • First run: ~30 seconds (downloads Whisper model, ~140MB)
  • Typical: 5-10 seconds per message
  • Memory: ~1.5GB (base model)
  • Languages: English, Hindi (easily extended)

Supported Audio Formats

OGG (Opus), WAV, FLAC, MP3, CAF, AIFF, and more via libsndfile.

WhatsApp uses Opus-coded OGG by default — works out of the box.

Troubleshooting

"No module named 'whisper'"

pip install openai-whisper

"No module named 'soundfile'"

pip install soundfile

Voice messages not processing?

  1. Check: clawdbot status (is it running?)
  2. Check: ~/.clawdbot/media/inbound/ (files arriving?)
  3. Run daemon manually: node scripts/voice-listener-daemon.js (see logs)

Slow transcription?

Use smaller model: whisper.load_model("base") or "tiny"

Further Reading

  • Setup Guide: See references/SETUP.md for detailed installation and configuration
  • API Reference: See references/API.md for function signatures and examples
  • Examples: Check scripts/ for working code

License

MIT - Use freely, customize, contribute back!


Built for real-world use in Clawdbot. Battle-tested with multiple languages and use cases.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-28 14:35 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

communication-collaboration

Slack

steipete
当需要通过 slack 工具从 Clawdbot 控制 Slack 时使用,包括在频道或私信中回复消息或置顶/取消置顶项目。
★ 157 📥 47,681
ai-intelligence

Remember All Prompts Daily

syedateebulislam
通过提取和按日期归档所有提示词,在Token压缩周期中保持对话连贯性。系统在Token使用率达95%(压缩前)和1%(新阶段开始)时自动触发以导出历史记录,并在会话重启时导入归档摘要以恢复上下文。
★ 3 📥 2,876
communication-collaboration

imap-smtp-email

gzlicanyi
使用IMAP/SMTP读取和发送邮件;检查新/未读邮件、获取内容、搜索邮箱、标记已读/未读、发送带附件的邮件。支持...
★ 114 📥 52,423