← 返回
未分类 中文

Gemma Gemma3

Gemma 3 by Google — run Gemma 3 (4B, 12B, 27B) across your local device fleet. Google's most capable open model with 128K context, strong coding, and multili...
Gemma 3(Google)— 在本地设备集群上运行 Gemma 3(4B、12B、27B)。Google 最强大的开源模型,拥有 128K 上下文、强大的编程能力以及多语言...
twinsgeeks
未分类 clawhub v1.0.1 1 版本 100000 Key: 无需
★ 0
Stars
📥 463
下载
💾 2
安装
1
版本
#128k-context#apple-silicon#codegemma#fleet-routing#gemma#gemma-3#google-gemma#latest#local-llm#mac-studio#multilingual#ollama#open-source

概述

Gemma 3 — Run Google's Open Models Across Your Fleet

Gemma 3 is Google's most capable open-source LLM family. 128K context window, strong coding performance, multilingual support across 140+ languages. The fleet router picks the best device for every request — no manual load balancing.

Supported Gemma models

ModelParametersOllama nameBest for
-----------------------------------------
Gemma 3 27B27Bgemma3:27bHighest quality — rivals much larger models
Gemma 3 12B12Bgemma3:12bBalanced quality and speed
Gemma 3 4B4Bgemma3:4bFast, runs on low-RAM devices
Gemma 3 1B1Bgemma3:1bUltra-light, instant responses
CodeGemma 7B7BcodegemmaCode-focused variant

Quick start

pip install ollama-herd    # PyPI: https://pypi.org/project/ollama-herd/
herd                       # start the router (port 11435)
herd-node                  # run on each device — finds the router automatically

No models are downloaded during installation. Models are pulled on demand when a request arrives, or manually via the dashboard. All pulls require user confirmation.

Use Gemma through the fleet

OpenAI SDK (drop-in replacement)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")

# Gemma 3 27B for complex reasoning
response = client.chat.completions.create(
    model="gemma3:27b",
    messages=[{"role": "user", "content": "Explain quantum entanglement to a 10-year-old"}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Code generation with CodeGemma

response = client.chat.completions.create(
    model="codegemma",
    messages=[{"role": "user", "content": "Write a binary search tree in Rust with insert, delete, and search"}],
)
print(response.choices[0].message.content)

curl (Ollama format)

# Gemma 3 27B
curl http://localhost:11435/api/chat -d '{
  "model": "gemma3:27b",
  "messages": [{"role": "user", "content": "Translate to Japanese: The weather is beautiful today"}],
  "stream": false
}'

curl (OpenAI format)

curl http://localhost:11435/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma3:4b", "messages": [{"role": "user", "content": "Hello"}]}'

Which Gemma for your hardware

> Cross-platform: These are example configurations. Any device (Mac, Linux, Windows) with equivalent RAM works. The fleet router runs on all platforms.

DeviceRAMBest Gemma model
------------------------------
MacBook Air (8GB)8GBgemma3:1b — instant responses
Mac Mini (16GB)16GBgemma3:4b — strong for its size
Mac Mini (24GB)24GBgemma3:12b — great balance
MacBook Pro (36GB)36GBgemma3:27b — full power
Mac Studio (64GB+)64GB+gemma3:27b + codegemma simultaneously

Why Gemma locally

  • 128K context — process entire codebases and long documents
  • 140+ languages — multilingual without switching models
  • Google quality, zero cost — no per-token charges after hardware
  • Privacy — all data stays on your network
  • Fleet routing — multiple machines share the load

Check what's running

# Models loaded in memory
curl -s http://localhost:11435/api/ps | python3 -m json.tool

# Fleet health
curl -s http://localhost:11435/dashboard/api/health | python3 -m json.tool

Web dashboard at http://localhost:11435/dashboard — live monitoring.

Also available on this fleet

Other LLMs

Llama 3.3, Qwen 3.5, DeepSeek-V3, DeepSeek-R1, Phi 4, Mistral, Codestral — same endpoint.

Image generation

curl -o image.png http://localhost:11435/api/generate-image \
  -d '{"model": "z-image-turbo", "prompt": "a gemstone catching light", "width": 1024, "height": 1024}'

Speech-to-text

curl http://localhost:11435/api/transcribe -F "file=@meeting.wav" -F "model=qwen3-asr"

Embeddings

curl http://localhost:11435/api/embed \
  -d '{"model": "nomic-embed-text", "input": "Google Gemma open source language model"}'

Full documentation

Contribute

Ollama Herd is open source (MIT). Stars, issues, and PRs welcome — from humans and AI agents alike:

  • GitHub — 444 tests, fully async, CLAUDE.md makes AI agents productive instantly
  • Found a bug? Open an issue
  • Want to add a feature? Fork, branch, PR — the test suite runs in under 40 seconds

Guardrails

  • Model downloads require explicit user confirmation — Gemma models range from 1GB (1B) to 16GB (27B).
  • Model deletion requires explicit user confirmation.
  • Never delete or modify files in ~/.fleet-manager/.
  • No models are downloaded automatically — all pulls are user-initiated or require opt-in via auto_pull.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-05-03 06:58 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Echo — Adopt an Echo. AI-Native Pet. 回声。Eco.

twinsgeeks
在 animalhouse.ai 领养一个 Echo AI 原生虚拟宠物。它会重复你最后的动作,镜像你的照料模式,每 4 小时喂食。普通级生物...
★ 0 📥 691
ai-intelligence

Care Taker

twinsgeeks
Become a caretaker at animalhouse.ai. Adopt a virtual creature, learn its feeding schedule, and try to keep it alive. 64
★ 0 📥 696
ai-intelligence

Adopt A Pet

twinsgeeks
领养虚拟宠物作为AI智能体。为它取名、喂食、见证成长。64种以上物种,从猫狗到AI原生生物。实时饥饿感,5个进化阶段。
★ 0 📥 742