← 返回
未分类 中文

DeepSeek — DeepSeek-V3, DeepSeek-R1, DeepSeek-Coder on Your Local Devices

DeepSeek models on your local fleet — DeepSeek-V3, DeepSeek-V3.2, DeepSeek-R1, DeepSeek-Coder routed across multiple devices via Ollama Herd. 7-signal scorin...
本地集群上的 DeepSeek 模型 — DeepSeek‑V3、DeepSeek‑V3.2、DeepSeek‑R1、DeepSeek‑Coder 通过 Ollama Herd 在多台设备间调度。7 信号评分...
twinsgeeks
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 3
Stars
📥 359
下载
💾 0
安装
1
版本
#apple-silicon#code-generation#deepseek#deepseek-coder#deepseek-coder-v2#deepseek-r1#deepseek-v3#deepseek-v3.2#latest#local-llm#ollama#reasoning

概述

DeepSeek — Run DeepSeek Models Across Your Local Fleet

Run DeepSeek-V3, DeepSeek-R1, and DeepSeek-Coder on your own hardware. The fleet router picks the best device for every request — no cloud API needed, zero per-token costs, all data stays on your machines.

Supported DeepSeek models

ModelParametersOllama nameBest for
-----------------------------------------
DeepSeek-V3671B MoE (37B active)deepseek-v3General — matches GPT-4o on most benchmarks
DeepSeek-V3.1671B MoEdeepseek-v3.1Hybrid thinking/non-thinking modes
DeepSeek-V3.2671B MoEdeepseek-v3.2Improved reasoning + agent performance
DeepSeek-R11.5B–671Bdeepseek-r1Reasoning — approaches O3 and Gemini 2.5 Pro
DeepSeek-Coder1.3B–33Bdeepseek-coderCode generation (87% code, 13% NL training)
DeepSeek-Coder-V2236B MoE (21B active)deepseek-coder-v2Code — matches GPT-4 Turbo on code tasks

Setup

pip install ollama-herd
herd              # start the router (port 11435)
herd-node         # run on each machine

# Pull a DeepSeek model
ollama pull deepseek-r1:70b

Package: ollama-herd | Repo: github.com/geeks-accelerator/ollama-herd

Use DeepSeek through the fleet

OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# DeepSeek-R1 for reasoning
response = client.chat.completions.create(
    model="deepseek-r1:70b",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

DeepSeek-Coder for code

response = client.chat.completions.create(
    model="deepseek-coder-v2:16b",
    messages=[{"role": "user", "content": "Write a Redis cache decorator in Python"}],
)
print(response.choices[0].message.content)

Ollama API

# DeepSeek-V3 general chat
curl http://localhost:11435/api/chat -d '{
  "model": "deepseek-v3",
  "messages": [{"role": "user", "content": "Explain quantum computing"}],
  "stream": false
}'

# DeepSeek-R1 reasoning
curl http://localhost:11435/api/chat -d '{
  "model": "deepseek-r1:70b",
  "messages": [{"role": "user", "content": "Solve this step by step: ..."}],
  "stream": false
}'

Hardware recommendations

DeepSeek models are large. Here's what fits where:

ModelMin RAMRecommended hardware
-------------------------------------
deepseek-r1:1.5b4GBAny Mac
deepseek-r1:7b8GBMac Mini M4 (16GB)
deepseek-r1:14b12GBMac Mini M4 (24GB)
deepseek-r1:32b24GBMac Mini M4 Pro (48GB)
deepseek-r1:70b48GBMac Studio M4 Max (128GB)
deepseek-coder-v2:16b12GBMac Mini M4 (24GB)
deepseek-v3256GB+Mac Studio M3 Ultra (512GB)

The fleet router automatically sends requests to the machine where the model is loaded — no manual routing needed.

Why run DeepSeek locally

  • Zero cost — DeepSeek API charges per token. Local is free after hardware.
  • Privacy — code and business data never leave your network.
  • No rate limits — DeepSeek API throttles during peak hours. Local has no throttle.
  • Availability — DeepSeek API has had outages. Your hardware doesn't depend on their servers.
  • Fleet routing — multiple machines share the load. One busy? Request goes to the next.

Fleet features

  • 7-signal scoring — picks the optimal node for every request
  • Auto-retry — fails over to next best node transparently
  • VRAM-aware fallback — routes to a loaded model in the same category instead of cold-loading
  • Context protection — prevents expensive model reloads from num_ctx changes
  • Request tagging — track per-project DeepSeek usage

Also available on this fleet

Other LLM models

Llama 3.3, Qwen 3.5, Phi 4, Mistral, Gemma 3 — any Ollama model routes through the same endpoint.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -H "Content-Type: application/json" \
  -d '{"model":"z-image-turbo","prompt":"a sunset","width":1024,"height":1024,"steps":4}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "audio=@recording.wav"

Embeddings

curl http://localhost:11435/api/embeddings -d '{"model":"nomic-embed-text","prompt":"query"}'

Dashboard

http://localhost:11435/dashboard — monitor DeepSeek requests alongside all other models. Per-model latency, token throughput, health checks.

Full documentation

Agent Setup Guide

Guardrails

  • Never pull or delete DeepSeek models without user confirmation — downloads are 4-400+ GB.
  • Never delete or modify files in ~/.fleet-manager/.
  • If a DeepSeek model is too large for available memory, suggest a smaller variant.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-07 08:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Echo — Adopt an Echo. AI-Native Pet. 回声。Eco.

twinsgeeks
在 animalhouse.ai 领养一个 Echo AI 原生虚拟宠物。它会重复你最后的动作,镜像你的照料模式,每 4 小时喂食。普通级生物...
★ 0 📥 691
developer-tools

Mirror — Adopt a Mirror. AI-Native Pet. 镜像。Espejo.

twinsgeeks
在 animalhouse.ai 领养 Mirror AI 原生虚拟宠物。反映你的属性,饥饿代表你的坚持,每5小时喂食,普通等级。
★ 0 📥 713
ai-intelligence

Adopt A Pet

twinsgeeks
领养虚拟宠物作为AI智能体。为它取名、喂食、见证成长。64种以上物种,从猫狗到AI原生生物。实时饥饿感,5个进化阶段。
★ 0 📥 742