Automate any X11 Linux desktop: capture screens, find and click elements, type, use hotkeys,
manage windows.
Preferred screenshot interpretation path: capture with capture.sh and interpret the image directly in your OpenClaw chat (existing image-capable model connection).
DISPLAY environment variable set (usually :0)bash install.sh once to install dependencies| Task | Command |
|---|---|
| ------ | --------- |
| Take screenshot | bash capture.sh |
| Screenshot of window | bash capture.sh --window "Firefox" |
| List windows | bash inspect.sh |
| Active window info | bash inspect.sh --active |
| Find window by name | bash inspect.sh --window "Firefox" |
| Click at coordinates | bash click.sh --x 500 --y 300 |
| Right-click | bash click.sh --x 500 --y 300 --button right |
| Double-click | bash click.sh --x 500 --y 300 --double |
| Click relative to window | bash click.sh --window "Firefox" --x 200 --y 150 |
| Type text | bash type.sh "hello world" |
| Type into window | bash type.sh --window "Terminal" "ls -la" |
| Send hotkey | bash hotkey.sh "ctrl+c" |
| Send Enter | bash hotkey.sh "Return" |
| Scroll down | bash scroll.sh --direction down --amount 3 |
| Scroll up at position | bash scroll.sh --x 500 --y 300 --direction up --amount 3 |
| Focus window | bash window.sh --action focus --window "Firefox" |
| Minimize window | bash window.sh --action minimize --window "Firefox" |
| Maximize window | bash window.sh --action maximize --window "Firefox" |
| Close window | bash window.sh --action close --window "Firefox" |
| Move window | bash window.sh --action move --window "Firefox" --x 100 --y 50 |
| Resize window | bash window.sh --action resize --window "Firefox" --width 1280 --height 800 |
For most GUI automation tasks, follow this pattern:
capture.sh — note the file path printedclick.sh --x X --y Y# Step 1: Capture the screen
SCREENSHOT=$(bash capture.sh | tail -1)
# Step 2: Look at the screenshot (read the image file with your vision)
# Examine the image and identify the Save button's position
# Step 3: Click at the coordinates you identified
bash click.sh --x 450 --y 320
# Focus the terminal window and type a command
bash type.sh --window "Terminal" "ls -la"
bash hotkey.sh "Return"
# Maximize Firefox, then focus a terminal
bash window.sh --action maximize --window "Firefox"
bash window.sh --action focus --window "Terminal"
All tools support a --json flag for machine-readable output:
{"success": true, "output": "...", "error": null}
On failure:
{"success": false, "output": null, "error": "Error description"}
If DISPLAY is not set (e.g., running over SSH), set it before calling any tool:
export DISPLAY=:0
For headless servers with a virtual display:
Xvfb :0 -screen 0 1920x1080x24 &
export DISPLAY=:0
Key names follow X11 conventions:
| Key | Name |
|---|---|
| ----- | ------ |
| Enter | Return |
| Tab | Tab |
| Escape | Escape |
| Backspace | BackSpace |
| Delete | Delete |
| Home | Home |
| End | End |
| Page Up | Page_Up |
| Page Down | Page_Down |
| F1-F12 | F1 through F12 |
| Super/Win | super |
| Ctrl | ctrl |
| Alt | alt |
| Shift | shift |
Combine with +: ctrl+c, ctrl+shift+t, alt+F4, super+d
共 1 个版本