v1.0.0

Windows Control

Spliff7777 Spliff7777 ← All skills

Full Windows desktop control. Mouse, keyboard, screenshots - interact with any Windows application like a human.

Downloads
2.3k
Stars
10
Versions
1
Updated
2026-02-23

Install

npx clawhub@latest install windows-control

Documentation

Windows Control Skill

Full desktop automation for Windows. Control mouse, keyboard, and screen like a human user.

Quick Start

All scripts are in skills/windows-control/scripts/

Screenshot

py screenshot.py > output.b64

Returns base64 PNG of entire screen.

Click

py click.py 500 300              # Left click at (500, 300)

py click.py 500 300 right # Right click

py click.py 500 300 left 2 # Double click

Type Text

py type_text.py "Hello World"

Types text at current cursor position (10ms between keys).

Press Keys

py key_press.py "enter"

py key_press.py "ctrl+s"

py key_press.py "alt+tab"

py key_press.py "ctrl+shift+esc"

Move Mouse

py mouse_move.py 500 300

Moves mouse to coordinates (smooth 0.2s animation).

Scroll

py scroll.py up 5      # Scroll up 5 notches

py scroll.py down 10 # Scroll down 10 notches

Window Management (NEW!)

py focus_window.py "Chrome"           # Bring window to front

py minimize_window.py "Notepad" # Minimize window

py maximize_window.py "VS Code" # Maximize window

py close_window.py "Calculator" # Close window

py get_active_window.py # Get title of active window

Advanced Actions (NEW!)

Click by text (No coordinates needed!)

py click_text.py "Save" # Click "Save" button anywhere

py click_text.py "Submit" "Chrome" # Click "Submit" in Chrome only

Drag and Drop

py drag.py 100 100 500 300 # Drag from (100,100) to (500,300)

Robust Automation (Wait/Find)

py wait_for_text.py "Ready" "App" 30 # Wait up to 30s for text

py wait_for_window.py "Notepad" 10 # Wait for window to appear

py find_text.py "Login" "Chrome" # Get coordinates of text

py list_windows.py # List all open windows

Read Window Text

py read_window.py "Notepad"           # Read all text from Notepad

py read_window.py "Visual Studio" # Read text from VS Code

py read_window.py "Chrome" # Read text from browser

Uses Windows UI Automation to extract actual text (not OCR). Much faster and more accurate than screenshots!

Read UI Elements (NEW!)

py read_ui_elements.py "Chrome"               # All interactive elements

py read_ui_elements.py "Chrome" --buttons-only # Just buttons

py read_ui_elements.py "Chrome" --links-only # Just links

py read_ui_elements.py "Chrome" --json # JSON output

Returns buttons, links, tabs, checkboxes, dropdowns with coordinates for clicking.

Read Webpage Content (NEW!)

py read_webpage.py                     # Read active browser

py read_webpage.py "Chrome" # Target Chrome specifically

py read_webpage.py "Chrome" --buttons # Include buttons

py read_webpage.py "Chrome" --links # Include links with coords

py read_webpage.py "Chrome" --full # All elements (inputs, images)

py read_webpage.py "Chrome" --json # JSON output

Enhanced browser content extraction with headings, text, buttons, and links.

Handle Dialogs (NEW!)

List all open dialogs

py handle_dialog.py list

Read current dialog content

py handle_dialog.py read

py handle_dialog.py read --json

Click button in dialog

py handle_dialog.py click "OK"

py handle_dialog.py click "Save"

py handle_dialog.py click "Yes"

Type into dialog text field

py handle_dialog.py type "myfile.txt"

py handle_dialog.py type "C:\path\to\file" --field 0

Dismiss dialog (auto-finds OK/Close/Cancel)

py handle_dialog.py dismiss

Wait for dialog to appear

py handle_dialog.py wait --timeout 10

py handle_dialog.py wait "Save As" --timeout 5

Handles Save/Open dialogs, message boxes, alerts, confirmations, etc.

Click Element by Name (NEW!)

py click_element.py "Save"                    # Click "Save" anywhere

py click_element.py "OK" --window "Notepad" # In specific window

py click_element.py "Submit" --type Button # Only buttons

py click_element.py "File" --type MenuItem # Menu items

py click_element.py --list # List clickable elements

py click_element.py --list --window "Chrome" # List in specific window

Click buttons, links, menu items by name without needing coordinates.

Read Screen Region (OCR - Optional)

py read_region.py 100 100 500 300     # Read text from coordinates

Note: Requires Tesseract OCR installation. Use read_window.py instead for better results.

Workflow Pattern

1. Read window - Extract text from specific window (fast, accurate)

2. Read UI elements - Get buttons, links with coordinates

3. Screenshot (if needed) - See visual layout

4. Act - Click element by name or coordinates

5. Handle dialogs - Interact with popups/save dialogs

6. Read window - Verify changes

Screen Coordinates

  • -Origin (0, 0) is top-left corner
  • -Your screen: 2560x1440 (check with screenshot)
  • -Use coordinates from screenshot analysis

Examples

Open Notepad and type

Press Windows key

py key_press.py "win"

Type "notepad"

py type_text.py "notepad"

Press Enter

py key_press.py "enter"

Wait a moment, then type

py type_text.py "Hello from AI!"

Save

py key_press.py "ctrl+s"

Click in VS Code

Read current VS Code content

py read_window.py "Visual Studio Code"

Click at specific location (e.g., file explorer)

py click.py 50 100

Type filename

py type_text.py "test.js"

Press Enter

py key_press.py "enter"

Verify new file opened

py read_window.py "Visual Studio Code"

Monitor Notepad changes

Read current content

py read_window.py "Notepad"

User types something...

Read updated content (no screenshot needed!)

py read_window.py "Notepad"

Text Reading Methods

Method 1: Windows UI Automation (BEST)
  • -Use read_window.py for any window
  • -Use read_ui_elements.py for buttons/links with coordinates
  • -Use read_webpage.py for browser content with structure
  • -Gets actual text data (not image-based)
Method 2: Click by Name (NEW)
  • -Use click_element.py to click buttons/links by name
  • -No coordinates needed - finds elements automatically
  • -Works across all windows or target specific window
Method 3: Dialog Handling (NEW)
  • -Use handle_dialog.py for popups, save dialogs, alerts
  • -Read dialog content, click buttons, type text
  • -Auto-dismiss with common buttons (OK, Cancel, etc.)
Method 4: Screenshot + Vision (Fallback)
  • -Take full screenshot
  • -AI reads text visually
  • -Slower but works for any content
Method 5: OCR (Optional)
  • -Use read_region.py with Tesseract
  • -Requires additional installation
  • -Good for images/PDFs with text

Safety Features

  • -pyautogui.FAILSAFE = True (move mouse to top-left to abort)
  • -Small delays between actions
  • -Smooth mouse movements (not instant jumps)

Requirements

  • -Python 3.11+
  • -pyautogui (installed ✅)
  • -pillow (installed ✅)

Tips

  • -Always screenshot first to see current state
  • -Coordinates are absolute (not relative to windows)
  • -Wait briefly after clicks for UI to update
  • -Use ctrl+z friendly actions when possible

---

Status: ✅ READY FOR USE (v2.0 - Dialog & UI Elements) Created: 2026-02-01 Updated: 2026-02-02

Launch an agent with Windows Control on Termo.