openadapt-grounding¶

UI element grounding for improved action accuracy.

Repository: OpenAdaptAI/openadapt-grounding

Installation¶

pip install openadapt[grounding]
# or
pip install openadapt-grounding

Overview¶

The grounding package provides UI element detection and grounding to improve:

Click accuracy by targeting element centers
Robustness to UI changes
Visual understanding of interfaces

Features¶

Element Detection¶

Detect UI elements in screenshots:

Buttons
Text fields
Links
Icons
Menus

Bounding Box Extraction¶

Get precise coordinates for UI elements.

Set-of-Mark (SoM) Prompting¶

Overlay numbered markers on detected elements for LMM prompting.

Python API¶

from openadapt_grounding import ElementDetector, SoMPrompt

# Detect elements in a screenshot
detector = ElementDetector()
elements = detector.detect(screenshot_path)

for element in elements:
    print(f"{element.label}: {element.bbox}")

# Create Set-of-Mark prompt
som = SoMPrompt(screenshot_path)
marked_image, element_map = som.create()

# element_map: {1: "Submit button", 2: "Email field", ...}

Integration with Policy Execution¶

from openadapt_ml import AgentPolicy
from openadapt_grounding import ElementDetector

# Create policy with grounding
policy = AgentPolicy.from_checkpoint(
    "model.pt",
    grounding=ElementDetector()
)

# Actions will use grounded coordinates
observation = load_screenshot()
action = policy.predict(observation)

CLI Commands¶

Detect Elements¶

openadapt ground detect screenshot.png

Output:

Found 12 elements:
  1. Button: "Submit" at (450, 320, 520, 350)
  2. TextField: "Email" at (200, 200, 400, 230)
  ...

Create SoM Image¶

openadapt ground som screenshot.png --output marked.png

Key Exports¶

Export	Description
`ElementDetector`	Detects UI elements
`SoMPrompt`	Creates Set-of-Mark prompts
`BoundingBox`	Element coordinates
`Element`	Detected element data

Models¶

Model	Size	Accuracy	Speed
`omniparser`	1.2GB	High	Medium
`som-base`	500MB	Medium	Fast
`custom`	-	-	-

Set-of-Mark Paper
OpenAdaptAI/SoM - SoM implementation

openadapt-ml - Use grounding in policy learning and execution
openadapt-capture - Apply grounding to demonstrations