概述

MUSA Torch Coding

Guide for generating PyTorch code that runs on Moore Threads (摩尔线程) MUSA GPUs using torch_musa.

Overview

MUSA (Metaverse Unified System Architecture) is Moore Threads' GPU computing platform. This skill helps generate code that:

Runs on Moore Threads GPUs via torch_musa
Converts CUDA code to MUSA-compatible code
Sets up proper environments (conda v1.2/v1.3)
Follows MUSA best practices

Key Differences: CUDA vs MUSA

| CUDA | MUSA |

| ------------------------------ | ------------------------------ |

| torch.cuda | torch.musa |

| torch.device("cuda") | torch.device("musa") |

| torch.cuda.is_available() | torch.musa.is_available() |

| backend='nccl' | backend='mccl' |

| torch.cuda.device_count() | torch.musa.device_count() |

| torch.cuda.get_device_name() | torch.musa.get_device_name() |

Environment Setup

⚠️ Important: MUSA Uses Pre-configured Conda Environments

DO NOT install PyTorch, vLLM, or related packages manually. MUSA environments are custom-built and include:

MUSA-specific PyTorch builds (not compatible with standard PyTorch)
MUSA-customized vLLM versions
MUSA drivers and SDK integration

Installing standard packages from PyPI will break the environment.

Conda Environment (v1.2/v1.3)

MUSA provides pre-configured conda environments. Common environment names:

v1.2 - MUSA SDK v1.2 environment
v1.3 - MUSA SDK v1.3 environment (newer)

# List available MUSA environments
conda env list | grep -E "(v1\.2|v1\.3|musa)"

# Activate the appropriate environment
conda activate v1.2  # or v1.3

# Verify MUSA availability
python -c "import torch_musa; import torch; print(torch.musa.is_available())"

Environment Detection & Setup

If no MUSA conda environment is detected:

Check if MUSA is installed:

```bash

which musaInfo # Should show musaInfo path

ls /usr/local/musa/ # MUSA SDK location

```

If MUSA is not set up:

Use the musa-env-setup skill for complete environment installation
The skill covers SDK installation, conda setup, and vLLM-MUSA configuration

Common conda environment locations:

/opt/conda/envs/
~/conda/envs/
/usr/local/conda/envs/

Key Environment Variables

| Variable | Purpose |

| ------------------------------ | ------------------------- |

| MUSA_VISIBLE_DEVICES=0,1,2,3 | Control visible GPU IDs |

| MUSA_LAUNCH_BLOCKING=1 | Synchronous kernel launch |

| MUDNN_LOG_LEVEL=INFO | Enable MUDNN logging |

| TORCH_SHOW_CPP_STACKTRACES=1 | Show C++ stack traces |

Code Generation Rules

When generating PyTorch code for MUSA:

Always import torch_musa

```python

import torch_musa # Must import before using torch.musa

```

Use torch.device("musa")

```python

device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")

tensor = torch.tensor([1.0, 2.0], device=device)

```

Use 'mccl' for distributed training

```python

dist.init_process_group(backend='mccl', ...)

```

Mixed precision (AMP) is supported

```python

from torch.cuda.amp import autocast, GradScaler # Same API

```

TensorCore optimization available

Set torch.backends.musa.matmul.allow_tf32 = True for TensorFloat32

Model Templates

For common model types, see templates in references/:

reference.md - Complete MUSA API reference

Common Tasks

Check GPU Availability

import torch
import torch_musa

print(f"MUSA available: {torch.musa.is_available()}")
print(f"Device count: {torch.musa.device_count()}")
print(f"Device name: {torch.musa.get_device_name(0)}")

Training Loop Pattern

import torch_musa

# Device setup
device = torch.device("musa") if torch.musa.is_available() else torch.device("cpu")

# Model and data to device
model = model.to(device)
inputs = inputs.to(device)

# Training (same as CUDA)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

Distributed Training (DDP)

import torch.distributed as dist
import torch_musa

# Initialize with mccl backend
dist.init_process_group(backend='mccl', rank=rank, world_size=world_size)

# Create process group on MUSA
torch.cuda.set_device(local_rank)  # torch_musa extends torch.cuda API

Code Conversion

When converting existing CUDA code to MUSA:

Add import torch_musa at the top
Replace cuda with musa in device strings
Replace nccl with mccl for distributed backend
Keep all other PyTorch API calls unchanged

Troubleshooting

Device not found: Ensure user is in render group: sudo usermod -aG render $(whoami)
Library not found: Check LD_LIBRARY_PATH includes /usr/local/musa/lib/
Build issues: Clean and rebuild: python setup.py clean && bash build.sh
Docker issues: Use --env MTHREADS_VISIBLE_DEVICES=all

Reference

For detailed API reference and examples, see references/reference.md.

版本历史

共 1 个版本

v1.0.0 当前

2026-03-19 16:26 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

musa-torch-coding

概述

MUSA Torch Coding

Overview

Key Differences: CUDA vs MUSA

Environment Setup

⚠️ Important: MUSA Uses Pre-configured Conda Environments

Conda Environment (v1.2/v1.3)

Environment Detection & Setup

Key Environment Variables

Code Generation Rules

Model Templates

Common Tasks

Check GPU Availability

Training Loop Pattern

Distributed Training (DDP)

Code Conversion

Troubleshooting

Reference

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

Self-Improving + Proactive Agent

ontology

Proactive Agent