Initial backup 2026-02-17

2026-02-17 15:50:53 +00:00
commit 8902a93add
941 changed files with 131420 additions and 0 deletions
--- a/skills/desktop-control/AI_AGENT_GUIDE.md
+++ b/skills/desktop-control/AI_AGENT_GUIDE.md
@@ -0,0 +1,448 @@
+# AI Desktop Agent - Cognitive Automation Guide
+
+## 🤖 What Is This?
+
+The **AI Desktop Agent** is an intelligent layer on top of the basic desktop control that **understands** what you want and figures out how to do it autonomously.
+
+Unlike basic automation that requires exact instructions, the AI Agent:
+- **Understands natural language** ("Draw a cat in Paint")
+- **Plans the steps** automatically
+- **Executes autonomously** 
+- **Adapts** based on what it sees
+
+---
+
+## 🎯 What Can It Do?
+
+### ✅ Autonomous Drawing
+```python
+from skills.desktop_control.ai_agent import AIDesktopAgent
+
+agent = AIDesktopAgent()
+
+# Just describe what you want!
+agent.execute_task("Draw a circle in Paint")
+agent.execute_task("Draw a star in MS Paint")
+agent.execute_task("Draw a house with a sun")
+```
+
+**What it does:**
+1. Opens MS Paint
+2. Selects pencil tool
+3. Figures out how to draw the requested shape
+4. Draws it autonomously
+5. Takes a screenshot of the result
+
+### ✅ Autonomous Text Entry
+```python
+# It figures out where to type
+agent.execute_task("Type 'Hello World' in Notepad")
+agent.execute_task("Write an email saying thank you")
+```
+
+**What it does:**
+1. Opens Notepad (or finds active text editor)
+2. Types the text naturally
+3. Formats if needed
+
+### ✅ Autonomous Application Control
+```python
+# It knows how to open apps
+agent.execute_task("Open Calculator")
+agent.execute_task("Launch Microsoft Paint")
+agent.execute_task("Open File Explorer")
+```
+
+### ✅ Autonomous Game Playing (Advanced)
+```python
+# It will try to play the game!
+agent.execute_task("Play Solitaire for me")
+agent.execute_task("Play Minesweeper")
+```
+
+**What it does:**
+1. Analyzes the game screen
+2. Detects game state (cards, mines, etc.)
+3. Decides best move
+4. Executes the move
+5. Repeats until win/lose
+
+---
+
+## 🏗️ How It Works
+
+### Architecture
+
+```
+User Request ("Draw a cat")
+    ↓
+Natural Language Understanding
+    ↓
+Task Planning (Step-by-step plan)
+    ↓
+Step Execution Loop:
+    - Observe Screen (Computer Vision)
+    - Decide Action (AI Reasoning)
+    - Execute Action (Desktop Control)
+    - Verify Result
+    ↓
+Task Complete!
+```
+
+### Key Components
+
+1. **Task Planner** - Breaks down high-level tasks into steps
+2. **Vision System** - Understands what's on screen (screenshots, OCR, object detection)
+3. **Reasoning Engine** - Decides what to do next
+4. **Action Executor** - Performsthe actual mouse/keyboard actions
+5. **Feedback Loop** - Verifies actions succeeded
+
+---
+
+## 📋 Supported Tasks (Current)
+
+### Tier 1: Fully Automated ✅
+
+| Task Pattern | Example | Status |
+|-------------|---------|---------|
+| Draw shapes in Paint | "Draw a circle" | ✅ Working |
+| Basic text entry | "Type Hello" | ✅ Working |
+| Launch applications | "Open Paint" | ✅ Working |
+
+### Tier 2: Partially Automated 🔨
+
+| Task Pattern | Example | Status |
+|-------------|---------|---------|
+| Form filling | "Fill out this form" | 🔨 In Progress |
+| File operations | "Copy these files" | 🔨 In Progress |
+| Web navigation | "Find on Google" | 🔨 Planned |
+
+### Tier 3: Experimental 🧪
+
+| Task Pattern | Example | Status |
+|-------------|---------|---------|
+| Game playing | "Play Solitaire" | 🧪 Experimental |
+| Image editing | "Resize this photo" | 🧪 Planned |
+| Code editing | "Fix this bug" | 🧪 Research |
+
+---
+
+## 🎨 Example: Drawing in Paint
+
+### Simple Request
+```python
+agent = AIDesktopAgent()
+result = agent.execute_task("Draw a circle in Paint")
+
+# Check result
+print(f"Status: {result['status']}")
+print(f"Steps taken: {len(result['steps'])}")
+```
+
+### What Happens Behind the Scenes
+
+**1. Planning Phase:**
+```
+Plan generated:
+  Step 1: Launch MS Paint
+  Step 2: Wait 2s for Paint to load
+  Step 3: Activate Paint window
+  Step 4: Select pencil tool (press 'P')
+  Step 5: Draw circle at canvas center
+  Step 6: Screenshot the result
+```
+
+**2. Execution Phase:**
+```
+[✓] Launched Paint via Win+R → mspaint
+[✓] Waited 2.0s
+[✓] Activated window "Paint"
+[✓] Pressed 'P' to select pencil
+[✓] Drew circle with 72 points
+[✓] Screenshot saved: drawing_result.png
+```
+
+**3. Result:**
+```python
+{
+    "task": "Draw a circle in Paint",
+    "status": "completed",
+    "success": True,
+    "steps": [... 6 steps ...],
+    "screenshots": [... 6 screenshots ...],
+}
+```
+
+---
+
+## 🎮 Example: Game Playing
+
+```python
+agent = AIDesktopAgent()
+
+# Play a simple game
+result = agent.execute_task("Play Solitaire for me")
+```
+
+### Game Playing Loop
+
+```
+1. Analyze screen → Detect cards, positions
+2. Identify valid moves → Find legal plays
+3. Evaluate moves → Which is best?
+4. Execute move → Click and drag card
+5. Repeat until game ends
+```
+
+### Game-Specific Intelligence
+
+The agent can learn patterns for:
+- **Solitaire**: Card stacking rules, suit matching
+- **Minesweeper**: Probability calculations, safe clicks
+- **2048**: Tile merging strategy
+- **Chess** (if integrated with engine): Move evaluation
+
+---
+
+## 🧠 Enhancing the AI
+
+### Adding Application Knowledge
+
+```python
+# In ai_agent.py, add to app_knowledge:
+
+self.app_knowledge = {
+    "photoshop": {
+        "name": "Adobe Photoshop",
+        "launch_command": "photoshop",
+        "common_actions": {
+            "new_layer": {"hotkey": ["ctrl", "shift", "n"]},
+            "brush_tool": {"hotkey": ["b"]},
+            "eraser": {"hotkey": ["e"]},
+        }
+    }
+}
+```
+
+### Adding Custom Task Patterns
+
+```python
+# Add a custom planning method
+def _plan_photo_edit(self, task: str) -> List[Dict]:
+    """Plan for photo editing tasks."""
+    return [
+        {"type": "launch_app", "app": "photoshop"},
+        {"type": "wait", "duration": 3.0},
+        {"type": "open_file", "path": extracted_path},
+        {"type": "apply_filter", "filter": extracted_filter},
+        {"type": "save_file"},
+    ]
+```
+
+---
+
+## 🔥 Advanced: Vision + Reasoning
+
+### Screen Analysis
+
+The agent can analyze screenshots to:
+- **Detect UI elements** (buttons, text fields, menus)
+- **Read text** (OCR for labels, instructions)
+- **Identify objects** (icons, images, game pieces)
+- **Understand layout** (where things are)
+
+```python
+# Analyze what's on screen
+analysis = agent._analyze_screen()
+
+print(analysis)
+# Output:
+# {
+#     "active_window": "Untitled - Paint",
+#     "mouse_position": (640, 480),
+#     "detected_elements": [...],
+#     "text_found": [...],
+# }
+```
+
+### Integration with OpenClaw LLM
+
+```python
+# Future: Use OpenClaw's LLM for reasoning
+agent = AIDesktopAgent(llm_client=openclaw_llm)
+
+# The agent can now:
+# - Reason about complex tasks
+# - Understand context better
+# - Plan more sophisticated workflows
+# - Learn from feedback
+```
+
+---
+
+## 🛠️ Extending for Your Needs
+
+### Add Support for New Apps
+
+1. **Identify the app**
+2. **Document common actions**
+3. **Add to knowledge base**
+4. **Create planning method**
+
+Example: Adding Excel support
+
+```python
+# Step 1: Add to app_knowledge
+"excel": {
+    "name": "Microsoft Excel",
+    "launch_command": "excel",
+    "common_actions": {
+        "new_sheet": {"hotkey": ["shift", "f11"]},
+        "sum_formula": {"action": "type", "text": "=SUM()"},
+    }
+}
+
+# Step 2: Create planner
+def _plan_excel_task(self, task: str) -> List[Dict]:
+    return [
+        {"type": "launch_app", "app": "excel"},
+        {"type": "wait", "duration": 2.0},
+        # ... specific Excel steps
+    ]
+
+# Step 3: Hook into main planner
+if "excel" in task_lower or "spreadsheet" in task_lower:
+    return self._plan_excel_task(task)
+```
+
+---
+
+## 🎯 Real-World Use Cases
+
+### 1. Automated Form Filling
+```python
+agent.execute_task("Fill out the job application with my resume data")
+```
+
+### 2. Batch Image Processing
+```python
+agent.execute_task("Resize all images in this folder to 800x600")
+```
+
+### 3. Social Media Posting
+```python
+agent.execute_task("Post this image to Instagram with caption 'Beautiful sunset'")
+```
+
+### 4. Data Entry
+```python
+agent.execute_task("Copy data from this PDF to Excel spreadsheet")
+```
+
+### 5. Testing
+```python
+agent.execute_task("Test the login form with invalid credentials")
+```
+
+---
+
+## ⚙️ Configuration
+
+### Enable/Disable Failsafe
+```python
+# Safe mode (default)
+agent = AIDesktopAgent(failsafe=True)
+
+# Fast mode (no failsafe)
+agent = AIDesktopAgent(failsafe=False)
+```
+
+### Set Max Steps
+```python
+# Prevent infinite loops
+result = agent.execute_task("Play game", max_steps=100)
+```
+
+### Access Action History
+```python
+# Review what the agent did
+print(agent.action_history)
+```
+
+---
+
+## 🐛 Debugging
+
+### View Step-by-Step Execution
+```python
+result = agent.execute_task("Draw a star in Paint")
+
+for i, step in enumerate(result['steps'], 1):
+    print(f"Step {i}: {step['step']['description']}")
+    print(f"  Success: {step['success']}")
+    if 'error' in step:
+        print(f"  Error: {step['error']}")
+```
+
+### View Screenshots
+```python
+# Each step captures before/after screenshots
+for screenshot_pair in result['screenshots']:
+    before = screenshot_pair['before']
+    after = screenshot_pair['after']
+    
+    # Display or save for analysis
+    before.save(f"step_{screenshot_pair['step']}_before.png")
+    after.save(f"step_{screenshot_pair['step']}_after.png")
+```
+
+---
+
+## 🚀 Future Enhancements
+
+Planned features:
+
+- [ ] **Computer Vision**: OCR, object detection, UI element recognition
+- [ ] **LLM Integration**: Natural language understanding with OpenClaw LLM
+- [ ] **Learning**: Remember successful patterns, improve over time
+- [ ] **Multi-App Workflows**: "Get data from Chrome and put in Excel"
+- [ ] **Voice Control**: "Alexa, draw a cat in Paint"
+- [ ] **Autonomous Debugging**: Fix errors automatically
+- [ ] **Game AI**: Reinforcement learning for game playing
+- [ ] **Web Automation**: Full browser control with understanding
+
+---
+
+## 📚 Full API
+
+### Main Methods
+
+```python
+# Execute a task
+result = agent.execute_task(task: str, max_steps: int = 50)
+
+# Analyze screen
+analysis = agent._analyze_screen()
+
+# Manual mode: Execute individual steps
+step = {"type": "launch_app", "app": "paint"}
+result = agent._execute_step(step)
+```
+
+### Result Structure
+
+```python
+{
+    "task": str,                    # Original task
+    "status": str,                  # "completed", "failed", "error"
+    "success": bool,                # Overall success
+    "steps": List[Dict],            # All steps executed
+    "screenshots": List[Dict],      # Before/after screenshots
+    "failed_at_step": int,          # If failed, which step
+    "error": str,                   # Error message if failed
+}
+```
+
+---
+
+**🦞 Built for OpenClaw - The future of desktop automation!**