Visual DOM vs. Accessibility Tree: What AI Needs

When giving AI agents visual context of a webpage, developers often feed the LLM the raw visual DOM tree.

Feeding a model 20,000 lines of <div> tags, inline styles, data-attributes, and SVG paths makes it hard for the AI to understand what’s actually on the page.

AI models process web pages similarly to how screen readers do. A user navigating with a screen reader doesn’t need to know the hex value of an element’s background or the z-index. They need to know the structure, boundaries, roles, and names.

This information lives in the Accessibility (A11y) tree.

<!-- The Raw Visual DOM -->
<div class="hover:bg-blue-600 transition p-4">
  <span class="text-white font-bold inline-flex">
    <svg>...</svg>
    Submit Form
  </span>
</div>

If an LLM receives the code above, it has to guess the intent. Is it a button, a link, or just a decorative banner?

<!-- The Extracted A11y Tree Node -->
{
  "role": "button",
  "name": "Submit Form",
  "focusable": true
}

When FeedbackFalcon captures a client’s page state, it extracts the computed A11y tree. Providing the model with semantic roles and accessible names helps the LLM grasp the structure of the UI immediately. It can generate a closer fix because it’s working with semantic intent rather than just presentation markup.

The screen reader approach