Skip to content

Instantly share code, notes, and snippets.

@mcking65
Last active August 31, 2023 15:05
Show Gist options
  • Save mcking65/11882ebbe2889964c62ab5a16ab528c3 to your computer and use it in GitHub Desktop.
Save mcking65/11882ebbe2889964c62ab5a16ab528c3 to your computer and use it in GitHub Desktop.
An Accessibility Opportunity Hidden in Modeless Web Dialogs

An Accessibility Opportunity Hidden in Modeless Web Dialogs

In the current effort to resolve issues related to standardizing web dialogs (see ARIA issue 1708), it is proposed that the ARIA specification recommend that browsers and screen readers treat modeless dialogs as landmark regions. While the discussion has surfaced some potential benefits to modeless dialog landmarks, I believe equating modeless dialogs and landmark regions is not an optimal path. I might even go so far as to call it a huge mistake. That path would preclude extremely valuable user benefits that can only be realized if screen readers do not render boundaries of modeless dialogs the way they render boundaries of landmark regions.

If ARIA were to standardize modeless dialogs as landmark regions, we would give web authors yet another construct for doing little more than labeling a group of elements within a page. Yet, non-web platforms offer other grouping constructs with a different set of screen reader behaviors that are extremely beneficial to users. Web authors cannot create experiences that offer those benefits. Modeless dialogs are the ideal vehicle for bringing those beneficial behaviors to the web.

I’ll name and describe the specific constructs and behaviors, but we need some new terminology first. I’m fairly convinced that once we can have unambiguous discussion of all relevant dimensions of this issue, we can come up with a slightly different approach that could catapult many very challenging web experiences into a new plane of accessibility. This approach would likely differentiate modeless dialogs from landmark regions both semantically and in how they are supported by browsers and screen readers.

We Need More Vocabulary

One reason we have this gap between web and native screen reader experiences is that we don’t have the language we need to define it. In visual design, we can use the language of CSS to talk about every minute detail of visual presentation. We don’t have a screen reader equivalent.

So, I’m going to make up some terms that I hope will facilitate clear and concise conversation about some specific aspects of screen reader rendering. Then, I’ll use those terms to describe ways of differentiating modeless dialogs that I believe would be very beneficial.

User Interface (UI) Container Elements

For the purposes of this discussion, a UI container is a UI element that contains other elements. For example, tables and lists are containers that provide structure for other elements.

While this discussion is not limited to UI that is built with ARIA, it is worth noting that nearly all ARIA roles represent containers. They range in nature and complexity from listitem and paragraph to menu and grid. Only a handful of roles, such as img, button, and checkbox, do not serve as container roles. That is, screen readers present an image or checkbox as a single element. Users will not find a paragraph or table inside of a checkbox. However, they can find a checkbox inside of a paragraph or table. Both ARIA and HTML have rules governing such constraints.

Window Containers

One special kind of container is a window. Only half joking, I call it special because in accessibility specifications, its definition is especially vague. In fact, broader consensus of an improved definition of window is a key to unlocking the benefits of differentiating modeless dialogs from other containers.

Windows are also special in that they are the root container for an app. All other containers exist within a window. This is the case in both web and native software. When describing what a dialog is, the ARIA specification explains that the body element is analogous to an app window:

For HTML pages, the primary application window is the entire web document, i.e., the body element.

Intra-App Windows: Windows Inside of Apps

Technically speaking, nearly all native desktop apps contain or open multiple windows at the same time. Frequently, multiple windows can be simultaneously visible and interactive. If you are reading this on the web, you can open your browser’s developer tools and this content will appear in two sibling windows. If you use a native Windows Word processor like Microsoft Word, you can open a grammar and spelling checker adjacent to your document. If you execute a find command within a native app, a window for controlling the find function will appear within the app window.

For this discussion, it will be useful to have a term that distinguishes windows that belong to a specific app from windows that are outside that app. I will use “intra-app windows” as a shorthand way of referring to an apps root window and all its descendants. Depending on the platform, descendant intra-app windows may be closeable, resizable, moveable, and either modal or modeless.

The ARIA window class is abstract. So, in web apps, intra-app windows do not have the role window. ARIA defines only one concrete subclass of window -- dialog. Thus, all intra-app windows that are descendants of the primary window in web apps are dialogs.

If you are reading this with a strictly web development perspective, you might feel I just committed logical overreach when I described a spell-checker or find function as a window. You might argue such constructs are just another segment of a web page. Indeed, because of how web accessibility has evolved, that is often how web versions of such experiences are coded. Note, however, that I was specifically describing native apps. From an accessibility perspective, this difference between web and native software is extremely consequential, and I will next attempt to develop language that will help us effectively discuss those consequences.

Terms to Describe How Screen Readers Convey UI Container Boundaries

Screen readers have similarities in how they convey or render boundaries of UI containers. For this discussion, it is important to name and describe them.

We can classify screen reader rendering of UI containers based on the ways that screen readers respond to and communicate their boundaries. Using this lens, I believe we can define three classes of UI container boundaries and associated screen reader rendering:

  1. Porous boundaries with verbal rendering
  2. Solid boundaries with restricting rendering
  3. Hidden boundaries with silent rendering

Porous Container Boundary Rendering

There are many types of containers with boundaries that screen readers render as Porous. Nearly all native applications and web pages employ a wide variety of them. On web pages, some of the most prevalent are headings, lists, tables, radio groups, and landmark regions.

Screen reader behaviors that make the rendering of a container’s boundaries porous are:

  1. Unrestricted Screen Reader Navigation: Using common screen reader navigation commands, users can navigate into and out of the container. For example, using a next item command, users can navigate into a list from the element that precedes the list.
  2. Boundary Announcements: If a screen reader treats the container as multiple items, it will typically inform users when navigation commands move the point of regard across a container boundary. For instance, when entering the beginning of a list, announce the start of the list. Note that some containers, such as headings, may be treated as a single item and that the primitives that are treated as items varies by platform.
  3. Container Information Announcements: Screen readers convey information about the container, such as its name and role and sometimes information about its size or structure. This information often serves as a boundary announcement, e.g., “Start of list with 5 items.”

Solid Container Boundary Rendering

Containers with solid boundaries are abundant, but whether a given screen reader will treat a given container’s boundaries as solid is a little less consistent across screen readers and platforms. All screen readers generally treat native window and modal dialog boundaries as solid, and some screen readers render some modeless dialog boundaries as solid.

Screen reader behaviors that make rendering of a container’s boundaries solid are:

  1. Restricted Screen Reader Navigation: By default, users cannot navigate beyond the container boundaries using common screen reader navigation commands. For example:
    • Using a next item command, users cannot navigate from inside a container with solid boundaries to elements outside of the container.
    • Using a command to navigate to the first or last element, the user’s point of regard will move to the first or last element within the solid container boundaries.
    • Using a command to navigate to the next heading, the users point of regard will not move to a heading outside the solid container boundaries.
    • Using a command to list headings or links, only headings or links inside the solid container boundaries are included in the list.
  2. Dedicated Reading Commands: Screen readers have special commands for reading the name and description of containers with solid boundaries, e.g., a command for reading the title of a window.

Users move focus into and out of containers with solid boundaries using commands provided by operating systems, apps, and web pages, e.g., Escape, F6, Alt+Tab, Windows+Tab, Command-Tab, and Command-`. For convenience, some screen readers provide alternatives to the methods provided by operating systems and apps. For example, JAWS has commands for listing open windows on the desktop or open tabs in a spreadsheet. VoiceOver provides commands for listing open apps and for listing open windows inside of an app.

Hidden Container Boundary Rendering

Containers with hidden boundaries are like containers with porous boundaries where the default screen reader behavior is to expose only the container contents without exposing the boundaries. This is usually because explicitly exposing the boundaries would create overwhelming verbosity that would interfere with comprehension. For example, screen readers do not announce when entering or leaving a paragraph. Depending on the screen reader and the platform, paragraph boundaries are used to divide content into separate items for navigation purposes. In some implementations, the paragraph boundary is represented by white space (e.g., a new line or blank line). Paragraph boundaries may also influence pauses when a screen reader is reading continuously. Some screen readers also have commands for navigating to the next or previous paragraph.

Screen reader behaviors that make rendering of a container’s boundaries hidden are:

  1. Unrestricted Screen Reader Navigation: Using common screen reader navigation commands, users can navigate into and out of the container. For example, using a next item command, users can navigate into a paragraph or generic div from the preceding element.
  2. Silent Boundaries: If the screen reader renders a boundary, it is likely rendered as white space, e.g., a new line or blank line at the end of a paragraph. Otherwise, there is no indication that the user’s point of regard has crossed the container boundary while navigating.
  3. Lack of Container Information Announcements: Screen readers do not typically convey information about the container, such as its role.

Variation Across Screen Readers

For the most part, screen readers are fairly consistent about which types of elements they treat as having porous, solid, or hidden boundaries. That is, a container that one screen reader treats as porous by default is likely to be treated as porous by another. Some screen readers offer settings that enable users to change boundaries from verbal to silent and vice versa. For example, users may choose to make comment markers silent in a document.

There is wide variation in the advanced features that let users explore beyond solid boundaries. For instance, NVDA’s object review can climb the window hierarchy all the way to the desktop, and JAWS has an unrestricted mode where users can use the JAWS cursor to explore every visible element on the screen. Additionally, in native Windows apps, containers that support structured reading in a manner that is similar to web pages are not common. However, in apps where it is supported, the characteristics of porous, solid, and hidden boundaries are consistent with web experiences.

Even with the messiness of variations and exceptions, there is substantial value in classifying screen reader container boundary rendering as porous, solid, or hidden. We can use these concepts to advance conversations about dialog modality and screen reader interoperability.

Thought Exercise– Imagine a World Without Solid Container Boundaries

What if screen readers had evolved in a way where they didn’t render any container boundaries as solid? This is not hard to imagine because while visual interfaces have boundaries, none are truly equivalent to solid screen reader boundaries. Gaze and pointing devices can wander anywhere. Even screen reader explore by touch features are constrained only by the edge of the physical screen.

In a world without container boundaries that screen readers treat as solid, screen reader users could navigate from the end of one app to the start of the next using the same next item reading command that they use to navigate inside of an app. They could navigate all headings on the screen without being interrupted by app boundaries. Screen readers might have a next app command similar to a next landmark region command. They might even treat apps as a kind of landmark region. Of special note, screen reader users wouldn’t need to know anything about magical operating system key shortcuts like Alt+Tab or Command-Tab. You could even imagine operating system developers accommodating this model by having only one enormous Tab ring for the entire desktop.

It is interesting to ponder how screen reader users who grew up in a world without solid UI container boundaries would respond to the world we live in today. They would need to learn a combination of screen reader commands and other commands to get around. After adapting, would they consider life more or less complex?

One natural but not conclusive way of considering this question is to imagine your own evolution in the alternative reality. I’ve spent an embarrassing amount of time doing so. I can imagine adaptations screen reader developers may have invented to support productivity in that reality. I can imagine working with those adaptations and the cognitive needs I would experience as I explore a new app or strive to ensure my actions are occurring in the appropriate context. I am confident the cognitive and physical load would be greater than it is in a world where some containers have solid boundaries and that the result would be a correspondingly more error-filled and less productive life. I am also pretty confident my response would not be unique. If the basis for this confidence is not immediately apparent, it may become more so as you consider examples of how to operate with solid container boundaries described later.

This is not to say that more containers with solid boundaries is always better. Having too many walls also creates problems. Just as wall placement in building design is a usability matter, it is in software. App developers simply need the ability to build them where they will likely deliver more advantages than disadvantages.

Back to Reality Where Window and Dialog Boundaries Are Solid

Needless to say, I believe we are fortunate to live in a world where we have some container boundaries that stop screen reader navigation. The fact that navigating to the next window requires a different command from navigating to the next line is a feature that reduces cognitive load and stress while also providing many efficiency advantages.

Fundamental Gap: Web Developers Can’t Make Modeless Intra-App Windows with Solid Boundaries

While modeless intra-app windows, such as find dialogs, Chrome Dev Tools, and the Word spelling and grammar checker, are abundant in native software, it is not possible to create an equivalent in web-based software. Instead, developers who work on making web-based versions of those app features accessible put them into porous containers, such as landmark regions or groups. Occasionally, where modality does not impose problems, they choose to put them in modal dialogs without dimming content outside the container.

Perhaps a container with porous boundaries instead of an intra-app window with solid boundaries is advantageous to users in some cases. One thing I believe is certain, though, is that it is not advantageous in all cases. The lack of the ability to create modeless intra-app windows that are rendered by screen readers with solid boundaries thus represents a fundamental gap in what is possible in web development when compared to native development.

Because of this gap, numerous web apps use regions and groups to display features that may be much easier to learn and use if they were presented in containers with solid boundaries. This is one more reason why it is possible to imagine the cognitive burden screen reader users would experience if there were no such thing as solid boundaries.

Concrete Illustration of Solid Boundary Rendering

Before diving into the details of ARIA and HTML proposals, it might help to illustrate some of the screen reader behaviors associated with solid boundary rendering. As it happens, an app experience that demonstrates how NVDA, JAWS, and VoiceOver render solid boundaries around modeless windows is available to anyone reading this page with the Chrome browser. Chrome Dev Tools opens a modeless window that is directly related to the page content. One could easily imagine how the Dev Tools content could just as likely be a spelling and grammar checker or a comments panel in a word processor, or how it might be the content of a message in an email app.

Start by pressing F12 to open Chrome Dev Tools. Focus will move into Dev Tools. Now examine how screen reader navigation functions are affected by the boundaries between Dev Tools and the rendered view of the web page. Here are instructions for NVDA, JAWS, and VoiceOver.

If running NVDA:

  1. be sure NVDA is in browse mode by pressing NVDA+Space until you hear the browse mode beep.
  2. Press ctrl+Home to move the browse mode cursor to the first element in Dev tools. Press Up arrow, and the browse mode cursor does not move outside Dev Tools.
  3. Press ctrl+End to move to the last element and press down arrow. Again, the cursor does not move outside Dev Tools.
  4. Press d and shift+d to navigate by landmark region. Note that landmark navigation stays within the Dev Tools window.
  5. Press NVDA+F7 to open the elements list. Note that only elements from Dev Tools are listed.
  6. Press Shift+F6 to move focus back to the rendered HTML.
  7. Repeat steps two through 5 to observe browse mode navigation behavior and explore the elements list. Note that the browse cursor stays inside the rendered HTML window, and the elements list contains the rendered web page elements and does not contain elements from Dev Tools.
  8. Press F6 to move focus back to Dev Tools. Note that the browse mode cursor is still active, and its location is where you left it before pressing shift+F6. Moving focus between the modeless windows does not cause a loss of reading position. This extremely valuable feature is only possible when there is appropriate support by both the browser and screen reader.

If running JAWS, the procedure is identical except for step one. With JAWS, you need to be sure the virtual cursor is active. It can be activated by pressing Insert+Numpad-Plus or Capslock+semicolon.

If running VoiceOver, the steps related to navigation assume VoiceOver is configured with the default standard grouping behavior in its navigation settings.

  1. Press Ctrl-Option-Shift-Home (FN-Ctrl-Option-Shift-Left on a MacBook) to move the VoiceOver cursor to the first element of Dev Tools. Press Ctrl-Option-Left , and the VoiceOver cursor does not move outside Dev Tools.
  2. Press Ctrl-Option-Shift-End (FN-Ctrl-Option-Shift-Right on a MacBook) to move the VoiceOver cursor to the last element of Dev Tools. Press Ctrl-Option-Right, and the VoiceOver cursor does not move outside Dev Tools.
  3. Press Ctrl-Option-u to open the rotor menu. Note that only elements from Dev Tools are listed.
  4. Move focus back to the rendered HTML. I don’t know of a keyboard command for doing this. One possible path is: press Ctrl-Option-Shift-Home, then Ctrl-Option-Shift-Up, then Ctrl-Option-Left, then Ctrl-Option-Shift-Down.
  5. Repeat steps one through three to observe VoiceOver navigation behavior and explore the rotor menu. Note that the VoiceOver cursor stays inside the rendered HTML window, and the rotor menu contains the rendered web page elements and does not contain elements from Dev Tools.

Unfortunately, since Chrome and VoiceOver do not support direct movement of focus between Dev Tools and the Rendered HTML, the VoiceOver cursor always starts at the beginning when navigating between them. It is also worth noting that VoiceOver technically does not recognize Dev Tools as a window. If you use the command for first element in window (Ctrl-Option-Cmd-Home), the cursor is placed on a group that contains all content in the entire Chrome app window.

Unlike many other native modeless windows, Tab key navigation in Chrome Dev Tools does not respect the window boundary. So, navigation within either window is much less efficient than it could be, and it is easy for screen reader users to unintentionally cross the window boundaries. While this aspect of Tab navigation is related to modeless window specifications, decisions related to screen reader rendering do not depend on it, and it deserves its own, independent discussion.

Proposal: Standardize Modeless Dialogs as Solid Containers

Screen readers already render modal dialogs with solid boundaries. What if the ARIA specification set an expectation that screen readers treat modeless dialogs equivalently? In other words, instead of standardizing modeless dialogs as landmark regions, which would make them porous containers, use modeless dialogs as a vehicle for giving web app developers the ability to create web apps with multiple intra-app windows with boundaries that screen readers treat as solid.

Treating all dialogs as containers with solid boundaries would generate two significant, downstream problems. However, I hope to illustrate that both of these problems are not only readily solvable, partly because critical aspects of the solutions already exist, but that solving them yields numerous serendipities.

Problems Created by This Proposal

If screen readers were to render both modal and modeless web dialog boundaries as solid, two problems need to be solved:

  1. There are no standardize keyboard mechanisms for moving keyboard focus among open dialogs contained within web pages.
  2. Because dialog boundaries would restrict navigation, screen reader users might not easily discover content that is outside the dialog or window that contains focus.

These are significant problems. If this proposal were to be adopted, it would be important to effectively solve both of them. Fortunately, highly effective solutions are readily within reach. As demonstrated by the Chrome Dev Tools example, we don’t have to start from scratch because existing patterns and code already provide partial solutions.

Play Along in a Hypothetical Example App

To help ensure clear discussion of these problems and the benefits of solving them, let’s imagine a hypothetical web app we can use to make discussion more concrete. This web app, dubbed Play Along, has three primary content spaces:

  1. Some app chrome that presents typical site navigation, a library of favorite games, and stuff like that.
  2. A space presenting the game the user is currently playing.
  3. A space containing group chat.

Suppose that the game and chat are each presented in modeless dialogs. The app chrome is structured in landmark regions outside the dialogs.

A Deeper Look at the Discovery Problem

An argument against this proposal and in favor of treating modeless dialogs as landmark regions is that screen readers already include robust support for landmark regions. Making one more kind of element behave like landmark regions is simple. In fact, it is so simple, most users probably wouldn’t even notice it.

However, if this proposal were adopted, screen readers would have to make far more noticeable changes. That is because if screen readers rendered modeless dialogs the way they render modal dialogs without making any additional changes, users might not find content that is presented in modeless dialogs. Or, if users are reading content inside a modeless dialog, they might not know that there is content outside of that dialog.

To ensure broad understanding of the content discovery challenges, it is important to ensure mutual understanding of why treating modeless dialogs as landmark regions would not create new content discovery problems.

The evolution of screen reader access to web pages has led users to expect the entirety of a web app to be presented as a single continuous document. When users load a page, they can use a reading cursor to traverse every element on the page, including static content elements that are not focusable or interactive. I sometimes liken a reading cursor to the gaze of a sighted user. However, a primary difference from gaze is that a reading cursor usually starts at the beginning of content and all elements are presented sequentially. In left-to-right languages, the expectation is that the first element in the reading order is at the top left of the page.

If, instead of adopting this proposal, screen readers were to present modeless dialogs as landmark regions and thus render their boundaries as porous, users could discover content inside dialogs with navigation techniques that they already use frequently. The dialog content would be presented as one more section within the document. Thus, users could stumble into dialog content when navigating by heading, region, or next item.

In the Play Along app, the game the user is playing is in one modeless dialog and the group chat is in another. If those dialogs were presented as landmark regions, the user could navigate into and out of them using screen reader navigation commands. On the other hand, if this proposal were adopted, screen readers would not present the game and chat content in the main document. Screen readers would need to do something to ensure users are aware of the dialogs and that they know how to read inside of them. In addition, if focus is inside the game, screen readers would need to do something to ensure users are aware that there is other content outside the game because normal navigation would be restricted at the edges of the game. Just as illustrated in the Chrome Dev Tools example above, the reading cursor would stay within the game dialog when using any of the normal navigation commands to move by element, such as item, heading, link, button, or region.

Problems Left Unsolved by Modeless Dialog Landmarks

While presenting modeless dialogs as landmark regions would avoid exacerbating content discovery problems, their porous boundaries would preclude using dialogs as a simple way of enabling authors to create web containers with solid boundaries. Authors have many options for creating porous containers, including landmark regions, but there is no way for web authors to create modeless, solid containers.

This lack of ability to create solid container boundaries inside of a web app matters. content discovery is only one of many challenges faced by screen reader users in complex web apps. Containers with solid boundaries could help web authors solve a multitude of other usability problems that are practically impossible to address today. Some of these include:

  • Comprehension: Users sometimes struggle to comprehend an app because the sheer complexity of a single document that contains a vast array of diverse content can be overwhelming.
  • Attention Focus: When exploring for the purpose of completing a task or discovering how to complete a task and the page includes a wide variety of disparate functionality, users must expend significant mental energy to be confident that their exploration stays within the bounds of the desired functionality.
  • Safety: When it is too easy to accidentally change context, users can easily be mistaken about the context in which they are performing a task.

Consider some of the issues that might arise in our hypothetical Play Along app. A game might include several landmark regions and a variety of inputs. Similarly, the chat app may also include multiple landmark regions and a variety of inputs. The developer of the app might not even have full control over the names of elements within the game if some of the games are developed by third parties.

Imagine trying to understand a new game where the game is a region that contains a bunch of regions. Using region navigation, there is no way to move to the beginning of the game region with a single command. Nor is there a single command to find the end of the game content. Using region navigation, it would take a lot of focused energy to understand what is inside the game and what is outside the game. The list of links wouldn’t help much. It would require a significant amount of time to discover which links are the first and last links within the game content. This would be more difficult if the links were changing on a regular basis.

Some games might include text inputs. Now try to imagine the horror users might feel after learning they have just poured some rapid-fire game play into the chat composer. Such a mistake could impact users’ sense of safety and confidence during play. It would be a common mistake if screen reader commands for navigating to the next input field were able to jump out of the game and into chat.

After studying the previously described Chrome Dev Tools example, hopefully it is obvious that solid boundaries around the game and chat experiences could radically simplify learning and reduce safety problems.

Benefits of this Proposal

This proposal would enable modeless dialogs to close a fundamental gap between web and native accessibility, could help authors solve a wide variety of usability challenges in complex web apps, is consistent with the ARIA ontology, and provides a practical path to greater interoperability that may not be otherwise achievable.

Specifically, immediate benefits of standardizing modeless dialog boundaries as solid include:

  1. Web authors would have the ability to create web apps that include multiple windows, giving them accessible experience options that currently exist only in native software.
  2. Screen reader users would experience unprecedented gains in simplicity and productivity in many complex web apps.
  3. Consistent treatment of all dialog boundaries regardless of modality could simplify learning for new screen reader users.
  4. Screen reader users would benefit from more consistency between web and native software.
  5. Industry would benefit from clearly defined and more consistent screen reader container rendering across platforms.
  6. It would avoid the need for either additional special-case exceptions in the ARIA specification or a restructuring of a portion of the ARIA class ontology. In the ARIA ontology, landmarks are structures, and dialogs are windows, which are not structures.

If modeless dialogs are not rendered with solid boundaries, intra-app window rendering would continue to be possible only in native software. That means there would always be some usability problems that uniquely impact screen reader users and that can only be solved with native software. Since screen reader usability is almost never the primary driver of platform choice for app developers, screen reader users would pay the price. In several ways, they are already paying it. So, let’s dive into what it would cost to eliminate that pain.

Solving Problem 1 – Moving Focus Among Open Modeless Windows and Dialogs

Standardized, browser-provided keyboard commands for navigating an open window ring, i.e., commands to move focus to next and previous open window, would solve this problem. We have long had standards that support Tab key behavior. They provide ample precedent for solving this acute gap in web accessibility.

To provide window ring navigation commands that would support windows spawned by web apps, both a root HTML document and each of its open descendant dialogs would be classified as sibling child windows within the browser. Thus, the windows that belong to a web page would be included within the browser’s ring of child windows. If the browser is not in full-screen mode, the browser’s ring of child windows would contain both windows from the currently active web page and one or more browser-provided child windows, e.g., its chrome, bookmark bar, inspector, etc. In the Play Along app, the window ring would include the root document, the game dialog, and the chat dialog in addition to the browser chrome.

If the root document or a descendant modeless dialog is inert due to an open modal, the inert windows would not be treated as open by the window navigation commands. An important condition to discuss is whether it should be possible for a branch of the window hierarchy to be inert without making the root document and sibling window branches inert. For example, in Play Along, if a modal opens in the game, should the author have the option of letting the user move focus into the sibling chat dialog to respond to a message while the game modal is open?

Like with Tab, DOM order would determine order in the window ring. While this proposal does not depend on implementation of Open UI pop-ups, the Open UI popup proposal could give authors additional control over navigation order since it has an objective:

Allow these top layer elements to reside at semantically-relevant positions in the DOM. I.e., it should not be required to re-parent a top layer element as the last child of the document.body simply to escape ancestor containment and transforms.

On the Windows platform, the previously mentioned F6 convention is well-established. F6 and Shift+F6 could serve as next and previous window commands. I’m not aware of an equivalent convention for macOS. To benefit people who use both platforms, adopting F6 would be nice. However, it might also be good to support a more mac-esque assignment.

It would be lovely if iOS VoiceOver were to use four-finger swipe up and down to rotate through open modeless dialogs or windows within an app and for that gesture to be supported in web-based apps. Four-finger swipe up and down would logically align with the other four-finger gestures: four-finger swipe left and right rotates through open apps and four-finger tap navigates to first and last elements. Obviously, it would be ideal if this latter gesture kept focus within the current modeless window.

Note that in most scenarios, users should not have to know the standardized navigation key because apps that employ modeless dialog should include context-appropriate methods for navigating. For example, if a word processor were to render its comments sidebar in a modeless dialog, the context menu for commented text in a document may include an option, with its own key assignment, for opening or moving to the associated comment. Similarly, the comment element within the comments dialog may include a button or link for navigating to the commented text within the document. In Play Along, the author might provide a hotkey, like Alt+G, to move focus from anywhere to the game dialog and another, like Alt+C, to move to the chat from anywhere.

Ideally, authors would create all modeless dialogs using the HTML dialog element and pickup browser support for free. However, if there are circumstances where there is a good reason to use the ARIA dialog role instead, it would be helpful if authors could hook into the browser’s window ring via JavaScript. I have no idea if that is feasible or even consistent with the principles of JavaScript architecture. If not, ARIA could include normative authoring requirements regarding the provision and surfacing of author-specified keys. If the app provides a self-managed window ring, perhaps the key for cycling through that window ring could be specified with the aria-hotkey property on the HTML body element. It might also be helpful for the ARIA specification to suggest use of aria-hotkey on dialog elements. Ideas for exposing window ring navigation keys to screen reader users are discussed in the next section.

Solving Problem 2 –Making Dialog Content Discoverable for Screen Reader Users

Interestingly, the discovery problem posed by multiple windows within an app has not been very well addressed for native apps. If you are blind and don’t know how to explore for open modeless windows in a native app, you might not become aware of them. So, effectively solving this discovery problem for web apps might start tilting the playing field toward the web as the platform offering the greatest potential for usable accessibility. It would offer both the benefits of structured reading for all elements and the containerization advantages of modeless windows without the content discovery problems posed by intra-app windows in native software.

Fortunately, it isn’t likely to be especially difficult to make all content discoverable in web apps with modeless dialogs that have solid boundaries. Here are some approaches, some of which have already been mentioned, some extensions of existing behavior, and a few new, simple ideas. The important matter will be settling on some base expectations to facilitate interoperability.

New Dialog Announcements: page load summaries could include information about open modeless windows and how to navigate among them. When new windows are open and do not receive focus, they could be announced, and with default verbosity, the window navigation key could be announced. The Windows version of Chrome already does something like this after a download; screen readers announce the download and tell users they can cycle to the download bar with shift+F6.

Dialog element Rendering: At the point in the reading order of a page where a dialog element is located, render the dialog element as a single item:

  • The rendering would be similar to how a button or link is rendered. It would include the name and role of the element., e.g., “game Dialog”.
  • The rendering could also include a hotkey if specified via aria-hotkey, e.g., “Game Dialog (alt+G).
  • Moving the reading cursor to the dialog element could optionally announce the dialog description if provided via aria-description or aria-describedby.
  • Moving the reading cursor to the dialog element could trigger announcement of a hint or tutor message that informs users of the window ring navigation key, e.g., “Use F6 to cycle between open dialogs and the main window.”
  • Unlike links and buttons, the dialog element itself would not be focusable. Thus, it would not have a default action provided by the web app, so screen readers could thus give users the ability to move their reading cursor inside the dialog element with the default action key, i.e., Enter for Windows screen readers or Ctrl-Opt-Space on macOS. Of course, to be consistent with native macOS, Ctrl-Opt-Shift-Down would interact with dialog content.
  • Note: Just as in Modal dialogs, It would be essential for screen readers to NOT add any new Escape key processing, leaving Escape handling to the web app. In other words, screen readers should not usurp Escape to use it as a means of moving their reading cursors outside modeless dialogs.
  • If the user has previously interacted with content inside the dialog, it would be beneficial if moving the reading cursor inside the dialog from the parent context were to restore the prior reading position.

Boundary Hints: When a user bumps into a solid boundary and presses more than one navigation key in an apparent attempt to move beyond the boundary, provide a hint with information about the window navigation key, e.g., “You can move to content outside this dialog with F6.” High verbosity announcements could even give information about the number of open windows and their names, e.g., “Use F6 to move to open window 3 of 4 in this page, which is named Chat”.

Element List Support: Include a category for windows and dialogs in screen reader element lists. For example, the VoiceOver rotor and NVDA element list could include an option for windows and dialogs, and JAWS could re-brand the frames list (JAWSKey+F9) to list open windows and dialogs within the current browser tab. VoiceOver could make Ctrl-Opt-F2 work more like Ctrl-Opt-F1 and include submenu items for web pages that have open modeless dialogs.

Close Window Announcements: When a modeless window closes, regardless of whether it contained focus, it could be useful for users to be alerted.

There are many other ways specific screen readers could help users discover and navigate among modeless dialogs. The most important aspects of this proposal to align on would be consistent solid boundary rendering, how open, modeless dialogs are represented in their parent container, and screen reader retention of last reading position for each window.

Side Note for ARIA Geeks: Natural Alignment with ARIA Ontology

At the root of the ARIA ontology, there are three abstract role types: structure, widget, and window. The dialog role is a subclass of window. The landmark region roles are structures. In other words, the ARIA ontology already recognizes both modal and modeless dialogs as containers that are differentiated from structures.

So, treating both modal and modeless dialogs as windows is consistent with both the current ontology and the current categorization of roles.

It’s Worth the Work

Apps from email and word processors to spreadsheets and video editors are available on the web. It is no secret that few, if any, screen reader users prefer to use a web version of such apps if a native version is available. One of the most fundamental reasons that the web versions have severe accessibility disadvantages is because web authors have very few ways of effectively scoping and directing screen reader user attention. The current situation is similar to a visual design language where the only two colors are black and white, only one font is available, only 2 font sizes are available, text decorations are not supported, all elements are positioned sequentially with the same spacing, and all margins, padding, and borders have identical appearance.

Dialog standards are a ready-made opportunity to address a significant dimension of this problem. They provide an opportunity to increase the potency of screen reader web experience design by an order of magnitude. So, while implementing this proposal is a bit larger lift than revising ARIA to specify modeless dialogs as landmark regions, it is certainly not an impractical stretch, and the potential benefits are enormous. Previously impossible to create experiences become easy to create. Some previously extremely complex web experiences could be radically simplified. If you start looking for opportunities to solve hard screen reader experience usability problems in complex sites, a fantastic array of possibilities appears.

@mbgower
Copy link

mbgower commented Sep 7, 2022

Thanks for the interesting article, @mcking65. I'm going to have to return to it to digest a few parts.

As someone who spent a lot of time scripting JAWS in the first decade of this century, it brought back memories of FocusChangeEvent and the Window detection that was inherent in that paradigm. Picking up that whole chunk of JAWS basic scripting and shoving it into an uncooperative app's script file, then beginning to fiddle with it used to be one of the primary tactics of improving JAWS' ability to understand context and recognize when it had jumped both components and some forms of boundaries.

Another big take away for me was your comment on the oddity of the Tab key not being restricted to the Chrome dev console (one can tab right out of it 'accidentally') despite the restriction for SR users. It highlights a big consideration of this for me, and that is that I would like the solid boundaries of screen readers to match very closely with those of non-screen reader keyboard users.

We largely have this marriage in truly modal dialogs -- Operation is restricted to the modal. The reason why I think it is good to try to align the two groups is that it is much easier for devs and testers to identify unintentional (or potentially problematic) porous boundaries with a keyboard than with a screen reader. So the odds of more problems being caught goes up.

Finally, I was pondering the delineated (but not always solid) boundaries which a user is able to move between. I'm thinking of the F6 key in the Windows app environment, where a user could potentially work in one pane, but could rapidly jump between panes with that single dedicated key. I find I really miss the F6 switch-panes functionality in a lot of richer web apps. As you note, sighted users are at a distinct advantage here, since they can observe the peer container environment. I know we've discussed how it might be possible to get some of the flexibility back in the hands of users if browsers could 'give it up' in Full Screen mode.

@mbgower
Copy link

mbgower commented Aug 31, 2023

I'm returning back to this article after almost a year, @mcking65, because I'm still dealing with a similar problem space to what prompted me to read and comment on this in the first place. My desire/need? To provide app creators and users with more opportunities to improve user experience with UIs that offer multiple solid-bounded regions -- and for more alignment between the experiences of blind and sighted keyboard operators with these non-porous regions of the page.

Simply put, I'm seeing a predominance of designs now where users are faced with complex apps that offer multiple, parallel modes of interaction.

  • You may be training a new AI model (which will have a toolbox for the different classifiers, a main composition window, and a properties side bar, all of which you need to seamlessly move between).
  • You may be in a social network portal, with a main (continuously updating) content region and surrounding active chat areas.
  • You may be in a new product, and an assistant window has appeared alongside your interaction, walking you through the context-specific features.

In all of these, I'm seeing a need for modeless window boundaries. Something that will let the user easily switch between these activities, then return to their prior activity, and just as critically, at the same point of operation within it.

Instead, what a sighted keyboard-reliant user is often faced with is a series of porous regions, all in one long tab ring. The best that can be said about many of them is that at least the user can reach all points of the application by keyboard. But in reality, they are so inefficient to use by keyboard that they are inaccessible.

As i stated in my previous comment, the more alignment there is between modeless windows behave for sighted keyboard-reliant users and screen reader users, the more likely issues (and improvements) are to be surfaced.

I'm wondering if there's been any developments in the year since you published this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment