← 返回
内容创作 Key 中文

Video Narrator

Generate SenseAudio TTS narration tracks for videos, including timestamped segments, style variants, and editor-ready voiceover exports. Use when users need...
为视频生成 SenseAudio TTS 配音轨道,包括时间戳片段、风格变体和编辑器就绪的配音导出。适用于用户需要配音旁白的场景。
scikkk
内容创作 clawhub v1.0.1 2 版本 99765.3 Key: 需要
★ 1
Stars
📥 830
下载
💾 72
安装
2
版本
#latest

概述

SenseAudio Video Narrator

Create professional narration audio for videos with timing-aware segmentation, natural delivery, and editor-friendly exports.

What This Skill Does

  • Generate narration audio synchronized to script timestamps
  • Match narration style to video genre such as documentary or tutorial
  • Control pacing with official TTS parameters and text break markers
  • Create multiple narration takes with different voices or styles
  • Export audio segments and merged narration tracks for editing workflows

Credential and Dependency Rules

  • Read the API key from SENSEAUDIO_API_KEY.
  • Send auth only as Authorization: Bearer .
  • Do not place API keys in query parameters, logs, or saved examples.
  • If Python helpers are used, this skill expects python3, requests, and pydub.
  • pydub is used only for optional local audio assembly and mixing.

Official TTS Constraints

Use the official SenseAudio TTS rules summarized below:

  • HTTP endpoint: POST https://api.senseaudio.cn/v1/t2a_v2
  • Model: SenseAudio-TTS-1.0
  • Max text length per request: 10000 characters
  • voice_setting.voice_id is required
  • voice_setting.speed range: 0.5-2.0
  • voice_setting.pitch range: -12 to 12
  • Optional audio formats: mp3, wav, pcm, flac
  • Optional sample rates: 8000, 16000, 22050, 24000, 32000, 44100
  • Optional MP3 bitrates: 32000, 64000, 128000, 256000
  • Optional channels: 1 or 2
  • extra_info.audio_length returns segment duration in milliseconds
  • Inline break markup such as is supported in text

Recommended Workflow

  1. Prepare the script:
    • Split narration into timestamped segments.
    • Keep each segment comfortably below the 10000 character limit.
  1. Choose a voice and pacing profile:
    • Pick a voice_id and tune speed, pitch, and optional vol.
    • Use shorter segments when timing precision matters.
  1. Generate audio segments:
    • Call the TTS API for each segment.
    • Decode data.audio from hex before saving.
    • Capture extra_info.audio_length for timeline metadata.
  1. Assemble the narration track locally:
    • Use pydub to position clips on a silent master track.
    • Keep per-segment files for easier editor import and retiming.
  1. Validate timing against the video:
    • Leave small gaps when natural pacing is needed.
    • Adjust segment boundaries instead of overusing extreme speed values.

Minimal Timed Narration Helper

import binascii
import os
import re

import requests

API_KEY = os.environ["SENSEAUDIO_API_KEY"]
API_URL = "https://api.senseaudio.cn/v1/t2a_v2"


def parse_timed_script(script):
    pattern = r"\[(\d{2}):(\d{2}):(\d{2})\]\s*(.+?)(?=\n\[|\Z)"
    segments = []
    for match in re.finditer(pattern, script, re.DOTALL):
        hours, minutes, seconds, text = match.groups()
        timestamp_ms = (int(hours) * 3600 + int(minutes) * 60 + int(seconds)) * 1000
        segments.append({"timestamp": timestamp_ms, "text": text.strip()})
    return segments


def synthesize_segment(text, voice_id, speed=1.0, pitch=0, vol=1.0):
    response = requests.post(
        API_URL,
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "model": "SenseAudio-TTS-1.0",
            "text": text,
            "stream": False,
            "voice_setting": {
                "voice_id": voice_id,
                "speed": speed,
                "pitch": pitch,
                "vol": vol,
            },
            "audio_setting": {
                "format": "mp3",
                "sample_rate": 32000,
                "bitrate": 128000,
                "channel": 2,
            },
        },
        timeout=60,
    )
    response.raise_for_status()
    data = response.json()
    return {
        "audio_bytes": binascii.unhexlify(data["data"]["audio"]),
        "duration_ms": data["extra_info"]["audio_length"],
        "trace_id": data.get("trace_id"),
    }

Local Assembly Pattern

from pydub import AudioSegment


def create_synced_narration(audio_segments, video_duration_ms):
    narration_track = AudioSegment.silent(duration=video_duration_ms)
    for segment in audio_segments:
        clip = AudioSegment.from_file(segment["file"])
        narration_track = narration_track.overlay(clip, position=segment["timestamp"])
    return narration_track

Style Presets

  • Documentary: slower speed such as 0.95, neutral pitch
  • Tutorial: speed near 1.0, slightly warmer pitch
  • Commercial: modestly faster speed, slightly higher pitch

Prefer conservative tuning and script editing over extreme voice parameter changes.

Output Options

  • Per-segment narration clips in mp3 or wav
  • Timing metadata in json
  • Merged narration track for video editors
  • Optional alternate takes with different styles

Safety Notes

  • Do not hardcode credentials.
  • Do not assume local media tooling exists beyond what is declared here.
  • Treat returned trace_id and generated narration assets as potentially sensitive production data.

版本历史

共 2 个版本

  • v1.0.1 当前
    2026-03-29 23:31 安全 安全
  • v1.0.0
    2026-03-14 04:54

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 860 📥 199,582
content-creation

Baidu Wenku AIPPT

ide-rea
使用百度文库 AI 智能生成 PPT,自动根据内容选择模板。
★ 66 📥 46,167
developer-tools

Meeting Assistant

scikkk
用于构建和排查 SenseAudio 会议助手,覆盖实时会议转写、说话人区分、实时翻译、会议纪要生成、行动项提取与转录导出。Build and troubleshoot SenseAudio meeting assistants for l
★ 1 📥 1,564