OpenAdapt¶

Auto-generated from OpenAdaptAI/OpenAdapt. Last synced: 2026-03-04 01:23 UTC

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)¶

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai

Architecture¶

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package	Description	Repository
`openadapt`	Meta-package with unified CLI	This repo
`openadapt-capture`	Event recording and storage	openadapt-capture
`openadapt-ml`	ML engine, training, inference	openadapt-ml
`openadapt-evals`	Benchmark evaluation	openadapt-evals
`openadapt-viewer`	HTML visualization	openadapt-viewer
`openadapt-grounding`	UI element localization	openadapt-grounding
`openadapt-retrieval`	Multimodal demo retrieval	openadapt-retrieval
`openadapt-privacy`	PII/PHI scrubbing	openadapt-privacy
`openadapt-wright`	Dev automation	openadapt-wright
`openadapt-herald`	Social media from git history	openadapt-herald
`openadapt-crier`	Telegram approval bot	openadapt-crier
`openadapt-consilium`	Multi-model consensus	openadapt-consilium
`openadapt-desktop`	Desktop GUI application	openadapt-desktop
`openadapt-tray`	System tray app	openadapt-tray
`openadapt-agent`	Production execution engine	openadapt-agent
`openadapt-telemetry`	Error tracking	openadapt-telemetry

Installation¶

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+

Quick Start¶

1. Record a demonstration¶

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model¶

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate¶

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings¶

openadapt capture view my-task

Ecosystem¶

Core Platform Components¶

Package	Description	Repository
`openadapt`	Meta-package with unified CLI	This repo
`openadapt-capture`	Event recording and storage	openadapt-capture
`openadapt-ml`	ML engine, training, inference	openadapt-ml
`openadapt-evals`	Benchmark evaluation	openadapt-evals
`openadapt-viewer`	HTML visualization	openadapt-viewer
`openadapt-grounding`	UI element localization	openadapt-grounding
`openadapt-retrieval`	Multimodal demo retrieval	openadapt-retrieval
`openadapt-privacy`	PII/PHI scrubbing	openadapt-privacy

Applications and Tools¶

Package	Description	Repository
`openadapt-desktop`	Desktop GUI application	openadapt-desktop
`openadapt-tray`	System tray app	openadapt-tray
`openadapt-agent`	Production execution engine	openadapt-agent
`openadapt-wright`	Dev automation	openadapt-wright
`openadapt-herald`	Social media from git history	openadapt-herald
`openadapt-crier`	Telegram approval bot	openadapt-crier
`openadapt-consilium`	Multi-model consensus	openadapt-consilium
`openadapt-telemetry`	Error tracking	openadapt-telemetry

CLI Reference¶

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works¶

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline¶

OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:

1. DEMONSTRATE (Observation Collection) - Capture: Record user actions and screenshots with openadapt-capture - Privacy: Scrub PII/PHI from recordings with openadapt-privacy - Store: Build a searchable demonstration library

2. LEARN (Policy Acquisition) - Retrieval Path: Embed demonstrations, index them, and enable semantic search - Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs) - Abstraction: Progress from literal replay to template-based automation

3. EXECUTE (Agent Deployment) - Observe: Take screenshots and gather accessibility information - Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL) - Ground: Map intentions to specific UI coordinates with openadapt-grounding - Act: Execute validated actions with safety gates - Evaluate: Measure success with openadapt-evals and feed results back for improvement

Core Approach: Demo-Conditioned Prompting¶

OpenAdapt explores demonstration-conditioned automation - "show, don't tell":

Traditional Agent	OpenAdapt Agent
User writes prompts	User records demonstration
Ambiguous instructions	Grounded in actual UI
Requires prompt engineering	Reduced prompt engineering
Context-free	Context from similar demos

Retrieval powers BOTH training AND evaluation: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the publication roadmap for methodology and limitations.

Key Concepts¶

Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
Evaluation-Driven Feedback: Success traces become new training data

Terminology¶

Term	Description
Observation	What the agent perceives (screenshot, accessibility tree)
Action	What the agent does (click, type, scroll, etc.)
Trajectory	Sequence of observation-action pairs
Demonstration	Human-provided example trajectory
Policy	Decision-making component that maps observations to actions
Grounding	Mapping intent to specific UI elements (coordinates)

Demos¶

Legacy Version (v0.46.0) Examples: - Twitter Demo - Early OpenAdapt demonstration - Loom Video - Process automation walkthrough

Note: These demos show the legacy monolithic version. For current v1.0+ modular architecture examples, see the documentation.

Permissions¶

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.

Legacy Version¶

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.

Contributing¶

Join Discord
Pick an issue from the relevant sub-package repository
Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

OpenAdaptAI/SoM - Set-of-Mark prompting
OpenAdaptAI/pynput - Input monitoring fork
OpenAdaptAI/atomacos - macOS accessibility

Support¶

Discord: https://discord.gg/yF527cQbDG
Issues: Use the relevant sub-package repository
Architecture docs: GitHub Wiki

License¶

MIT License - see LICENSE for details.

View on GitHub | Report an issue