> Agent-readable docs index: /llms.txt. Download /docs.zip to grep all markdown files locally.

---
title: Screenshot with Element Labels for AI Agents
sidebarTitle: Visual Labels
description: Overlay Vimium-style numbered labels on every interactive element in a screenshot so AI agents can see and click by reference.
icon: lucide:tags
---

When the agent needs to understand **where things are on screen**, `screenshotWithAccessibilityLabels` overlays color-coded labels on every interactive element. The agent sees the screenshot, reads the labels, and interacts by reference.

## Basic usage

```js
await screenshotWithAccessibilityLabels({ page: state.page })
// Image and accessibility snapshot are automatically included in the response
```

The function takes a screenshot, overlays numbered labels (`e1`, `e2`, `e3`...), captures the annotated image, then removes the labels. Both the **image** and an **accessibility snapshot** are returned automatically.

## Interacting with refs

Use `refToLocator` to convert a visual label to a Playwright locator:

```js
// From the screenshot, the agent sees label "e5" on a button
const locator = refToLocator({ ref: 'e5' })
await state.page.locator(locator).click()
```

Or use locators from the accompanying snapshot directly:

```js
await screenshotWithAccessibilityLabels({ page: state.page })
// Snapshot shows: role=button[name="Submit"] with ref e3
await state.page.locator('role=button[name="Submit"]').click()
```

## Color coding

Labels are **color-coded by element type** for quick visual parsing:

| Color  | Element type |
| ------ | ------------ |
| Yellow | Links        |
| Orange | Buttons      |
| Coral  | Inputs       |
| Pink   | Checkboxes   |
| Peach  | Sliders      |
| Salmon | Menus        |
| Amber  | Tabs         |

## Multiple screenshots

You can take **multiple screenshots** in a single execution. All images are included in the response:

```js
await screenshotWithAccessibilityLabels({ page: state.page })
await state.page.click('button.next')
await screenshotWithAccessibilityLabels({ page: state.page })
// Both images are returned
```

## Options

| Parameter         | Type    | Default  | Description                     |
| ----------------- | ------- | -------- | ------------------------------- |
| `page`            | Page    | required | Playwright page to screenshot   |
| `interactiveOnly` | boolean | `true`   | Only label interactive elements |

## When to use

Use `screenshotWithAccessibilityLabels` for **complex visual layouts** where spatial position matters: grids, image galleries, maps, dashboards, canvas-based UIs.

For **text-heavy pages** (forms, articles, lists), prefer `snapshot()` with search. It's faster, cheaper, and uses fewer tokens.

Both methods share the same **ref system**, so you can switch between text and visual modes freely. A ref from `snapshot()` works the same as a ref from `screenshotWithAccessibilityLabels()`.

## Resizing images

To reduce token usage, resize screenshots before reading them back into context:

```js
await state.page.screenshot({ path: '/tmp/page.png', scale: 'css' })
await resizeImageForAgent({ input: '/tmp/page.png' })
// Resized image is automatically included in the response
```

`resizeImageForAgent` accepts `width`, `height`, `maxDimension`, `quality`, and `format` options.