← 返回
AI智能 Key

musa-torch-coding

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
通过 OpenAI Whisper API 转录音频
lipeidcc
AI智能 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 580
下载
💾 12
安装
1
版本
#latest

概述

MUSA Torch Coding

Guide for generating PyTorch code that runs on Moore Threads (摩尔线程) MUSA GPUs using torch_musa.

Overview

MUSA (Metaverse Unified System Architecture) is Moore Threads' GPU computing platform. This skill helps generate code that:

  • Runs on Moore Threads GPUs via torch_musa
  • Converts CUDA code to MUSA-compatible code
  • Sets up proper environments (conda v1.2/v1.3)
  • Follows MUSA best practices

Key Differences: CUDA vs MUSA

| CUDA | MUSA |

| ------------------------------ | ------------------------------ |

| torch.cuda | torch.musa |

| torch.device("cuda") | torch.device("musa") |

| torch.cuda.is_available() | torch.musa.is_available() |

| backend='nccl' | backend='mccl' |

| torch.cuda.device_count() | torch.musa.device_count() |

| torch.cuda.get_device_name() | torch.musa.get_device_name() |

Environment Setup

⚠️ Important: MUSA Uses Pre-configured Conda Environments

DO NOT install PyTorch, vLLM, or related packages manually. MUSA environments are custom-built and include:

  • MUSA-specific PyTorch builds (not compatible with standard PyTorch)
  • MUSA-customized vLLM versions
  • MUSA drivers and SDK integration

Installing standard packages from PyPI will break the environment.

Conda Environment (v1.2/v1.3)

MUSA provides pre-configured conda environments. Common environment names:

  • v1.2 - MUSA SDK v1.2 environment
  • v1.3 - MUSA SDK v1.3 environment (newer)
# List available MUSA environments
conda env list | grep -E "(v1\.2|v1\.3|musa)"

# Activate the appropriate environment
conda activate v1.2  # or v1.3

# Verify MUSA availability
python -c "import torch_musa; import torch; print(torch.musa.is_available())"

Environment Detection & Setup

If no MUSA conda environment is detected:

  1. Check if MUSA is installed:

```bash

which musaInfo # Should show musaInfo path

ls /usr/local/musa/ # MUSA SDK location

```

  1. If MUSA is not set up:
  • Use the musa-env-setup skill for complete environment installation
  • The skill covers SDK installation, conda setup, and vLLM-MUSA configuration
  1. Common conda environment locations:
  • /opt/conda/envs/
  • ~/conda/envs/
  • /usr/local/conda/envs/

Key Environment Variables

| Variable | Purpose |

| ------------------------------ | ------------------------- |

| MUSA_VISIBLE_DEVICES=0,1,2,3 | Control visible GPU IDs |

| MUSA_LAUNCH_BLOCKING=1 | Synchronous kernel launch |

| MUDNN_LOG_LEVEL=INFO | Enable MUDNN logging |

| TORCH_SHOW_CPP_STACKTRACES=1 | Show C++ stack traces |

Code Generation Rules

When generating PyTorch code for MUSA:

  1. Always import torch_musa

```python

import torch_musa # Must import before using torch.musa

```

  1. Use torch.device("musa")

```python

device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")

tensor = torch.tensor([1.0, 2.0], device=device)

```

  1. Use 'mccl' for distributed training

```python

dist.init_process_group(backend='mccl', ...)

```

  1. Mixed precision (AMP) is supported

```python

from torch.cuda.amp import autocast, GradScaler # Same API

```

  1. TensorCore optimization available
  • Set torch.backends.musa.matmul.allow_tf32 = True for TensorFloat32

Model Templates

For common model types, see templates in references/:

  • reference.md - Complete MUSA API reference

Common Tasks

Check GPU Availability

import torch
import torch_musa

print(f"MUSA available: {torch.musa.is_available()}")
print(f"Device count: {torch.musa.device_count()}")
print(f"Device name: {torch.musa.get_device_name(0)}")

Training Loop Pattern

import torch_musa

# Device setup
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")

# Model and data to device
model = model.to(device)
inputs = inputs.to(device)

# Training (same as CUDA)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

Distributed Training (DDP)

import torch.distributed as dist
import torch_musa

# Initialize with mccl backend
dist.init_process_group(backend='mccl', rank=rank, world_size=world_size)

# Create process group on MUSA
torch.cuda.set_device(local_rank)  # torch_musa extends torch.cuda API

Code Conversion

When converting existing CUDA code to MUSA:

  1. Add import torch_musa at the top
  2. Replace cuda with musa in device strings
  3. Replace nccl with mccl for distributed backend
  4. Keep all other PyTorch API calls unchanged

Troubleshooting

  • Device not found: Ensure user is in render group: sudo usermod -aG render $(whoami)
  • Library not found: Check LD_LIBRARY_PATH includes /usr/local/musa/lib/
  • Build issues: Clean and rebuild: python setup.py clean && bash build.sh
  • Docker issues: Use --env MTHREADS_VISIBLE_DEVICES=all

Reference

For detailed API reference and examples, see references/reference.md.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-19 16:26 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,350 📥 317,467
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,384
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 832 📥 212,606