Visual DOM vs. Accessibility Tree: What AI Needs
Why capturing the A11y tree helps LLMs understand web pages better than raw HTML.
When giving AI agents visual context of a webpage, developers often feed the LLM the raw visual DOM tree.
Feeding a model 20,000 lines of <div> tags, inline styles, data-attributes, and SVG paths makes it hard for the AI to understand what’s actually on the page.
The screen reader approach
AI models process web pages similarly to how screen readers do. A user navigating with a screen reader doesn’t need to know the hex value of an element’s background or the z-index. They need to know the structure, boundaries, roles, and names.
This information lives in the Accessibility (A11y) tree.
<!-- The Raw Visual DOM -->
<div class="hover:bg-blue-600 transition p-4">
<span class="text-white font-bold inline-flex">
<svg>...</svg>
Submit Form
</span>
</div>
If an LLM receives the code above, it has to guess the intent. Is it a button, a link, or just a decorative banner?
<!-- The Extracted A11y Tree Node -->
{
"role": "button",
"name": "Submit Form",
"focusable": true
}
When FeedbackFalcon captures a client’s page state, it extracts the computed A11y tree. Providing the model with semantic roles and accessible names helps the LLM grasp the structure of the UI immediately. It can generate a closer fix because it’s working with semantic intent rather than just presentation markup.