ep. 33. On GenAI's role in XR accessibility

9 min read

Jun 12, 2024

A person in a circle (the icon for accessibility), a purple “plus” sign, and a pair of extended reality goggles. — Accessibility meets XR. Image created in Figma using icons from the Noun Project.

In this week’s episode, I discuss extended reality (XR) accessibility challenges, opportunities for generative AI (genAI) to address these challenges, and share a shortlist of XR accessibility resources. This episode was inspired by preparing for a panel discussion at the 15th annual Augmented World Expo (AWE), on How Generative AI Can Make XR Creation More Accessible.

What is accessibility?

Accessibility is the concept of whether a product or service can be used by everyone - however they encounter it. Accessible designs help everyone - the estimated 1.3 billion people around the world who experience significant disability, people with temporary impairments (e.g., a broken arm), and folks with situational requirements (e.g., working hands-free and eyes-free while driving).

Screenshot of a slide. The title says, Accessibility: required by 1+ billion; benefits all. Accessible technology benefits everyone, including people with permanent disabilities like those listed below, temporary impairments liek cataracts or broken arm, situational requirements like working hands-free and eyes-free while driving. There are six icons representing different disabilities: Visual (colorblind, low vision, blind), hearing (hard of hearing, deaf), cognitive (learning disabilities, autism, seizure), speed (speech impediment, unable to speak), mobility (arthritis, quadriplegia, spinal cord injury), neural (biploar, anxiety, PTSD, OCD, depression). Disabilities come in many forms both visible and unseen. — Accessibility is required by over a billion people world-wide, and benefits everyone. Image from Daniel Saints’s Quick Visual Guide to Understanding and Crafting Accessible Designs.

Accessibility is closely related to inclusive design, a methodology that enables and draws on the full range of human diversity. Three principles of inclusive design are:

Recognizing exclusion: acknowledging bias and recognizing exclusions that happen because of mismatches between people and experience.
Learning from diversity: Put people in the center throughout the process. Fresh, diverse perspectives are key to true insight.
Solve for one, extend to many: everyone has abilities and limits. Creating products for people with permanent disabilities creates results that benefit everyone - also known as the curb cut effect. This name comes from the Americans with Disabilities Act, where curbs cutouts and ramps originally installed for wheelchair access make space more navigable for everyone (e.g., parents with strollers, workers pushing hand trucks, travelers with rolling luggage).

We will discuss these concepts in relation to the creation and consumption of XR experiences. XR is an umbrella term encompassing all forms of immersive technologies that blend the physical and virtual worlds (e.g., augmented reality (AR), virtual reality (VR)).

XR meets accessibility

XR technologies have the potential to extend human capabilities, especially in areas like learning and connecting with others when we can be physically present together. However, these technologies still remain in their early-adoption phases. If we want to realize their potential, we need to create an ecosystem that welcomes people in, from the standpoint of both consumption and creation. As we saw with the curb cut effect, when we solve for one, it extends to many.

What does the curb cut effect look like when applied to XR?

We saw a recent example at Apple’s World Wide Developer Conference (WWDC) this week: Apple released systemwide Live Captions for the Vision Pro headset, which allows you to follow along with spoken dialogue in live conversations and in audio from apps. Live Captions are essential for users who are deaf or hard of hearing. By designing for these users, everyone benefits - for example, people using the device in noisy environments, people using the device in quiet environments where any audio could be disruptive (e.g., library), non-native language speakers, and people with a learning disability, an attention deficit or autism.

Apple generally has prioritized accessibility in the Vision Pro since day 1. This is a significant improvement for the XR field, for which accessibility has often been an afterthought. The XR user experience (UX) has historically been fraught with barriers like using controllers that often require use of both hands, experiences that depend on bodies being in certain positions to move, and experiences that only engage audio or visual modalities. Of course, there is still a broad barrier to adoption given the Vision Pro’s $3,500 USD price tag, but this prioritization of accessibility from the outset sets a good precedent for future XR UX.

The examples we’ve discussed so far relate consumption of XR experiences. What about XR content creation?

Accessibility and XR content creation

The XR content creation landscape is difficult to navigate, even before discussing accessibility. Dr. Reginé Gilbert, Professor at New York University’s Tandon School of Engineering, and UX practitioner Saki Asakawa created a map of the XR landscape in 2021, identifying over 60 tools you could use, depending on your goals. From experience, many of these tools are still relevant to XR creation workflows today.

XR City Map constructed like a subway map, with lines for gaming, drawing, AR, 3D frameworks, characters, prototyping, modeling, animation, VPAT availability, enterprise plan availability, and VR. — 2021 XR City Map by Dr. Reginé Gilbert and Saki Asakawa. Read the full article here.

Once you grasp this landscape, many software tools have a steep learning curve. 3D content creation tools typically have complex user interfaces that take time to learn (e.g., 3DS Max, Cinema 4D) - not to mention being expensive to license. A notable exception is Blender, an open-source content creation tool, with a strong community and numerous tutorials to help people learn. Nonetheless, creating 3D content on a flat, 2D screen still requires forming new mental models and adding cognitive load to envision spatial depth on a flat canvas.

Tools like Tilt Brush and Shapes XR allow people to create 3D while in-headset, enabling spatial creation for the medium, in the medium, and removing the abstraction layer that makes designing 3D on 2D screens hard. However, from an accessibility standpoint, these tools generally rely on controllers as user inputs (barriers for people with mobility disabilities), with highly visual user interfaces (barriers for people with visual impairments). Many tools have limited or no accessibility features and documentation.

Turning to tools for building and deploying XR experiences: Unity, a game engine commonly used for building XR experiences, requires programming knowledge (C#). The same goes for Unreal Engine (C++). There have been efforts to make code-free XR authoring tools. For example, I was part of the founding team for Adobe Aero, a code-free AR design tool. Apple’s Reality Composer has a similarly low barrier to entry, though you’ll likely export your experience into Xcode for deployment, which requires programming knowledge. While some of these tools have accessibility features, what would it look like if we instead designed them accessibility-first? For example, using voice prompts to create 3D assets or code? This approach could benefit all XR creators. Enter genAI.

GenAI’s potential for increasing XR content creation accessibility

GenAI tools enable real-time multimodal inputs and outputs (e.g., GPT-4o, Google’s Gemini and Apple Intelligence). For example, you can use voice prompts to generate an image, or you can show the system an image and it will describe it to you. These capabilities shift us away from default interfaces that require vision and mobility (e.g., navigating a user interface with a mouse), and can improve current technology accessibility [e.g., 1, 2, 3].

We are already seeing genAI intersect with XR wearables like Meta’s newest Ray-Ban glasses and Brilliant Labs’s Frame, leveraging voice interfaces and contextual understanding to help us learn about the world around us. Envision Glasses are built for people who are blind or have low vision, using AI to articulate everyday visual information into speech.

For content creation, genAI-powered software enables people to use text prompts to create 3D models (e.g., 3dfy, Sloyd, Bezi), lowering the barrier to creating 3D content that can be used in XR experiences. Nvidia’s Omniverse offers gen AI connectors and extensions that generate animations from body movements (Move.ai), 3D mesh for heads from facial scans (Lumirithmic) and photorealistic 3D visualizations of products from 360° video recordings (Elevate 3D).

While we should continue to explore no-code tools for XR content creation (another one of Bezi’s capabilities), genAI can also translate natural language to code (e.g., Meta’s Code Llama, Open AI’s Code Interpreter), and help accelerate developer’s existing workflows (e..g, GitHub’s CoPilot). These capabilities hold potential to accelerate XR content creation workflows.

However, these capabilities are not an automatic guarantee that these tools are also accessible. We need to prioritize including creators with disabilities in the product development process to help understand accessibility gaps and opportunities. Some approaches include partnering with artists with disabilities to use these creation tools in their practice, and via inclusive design research. For the latter, check out this inclusive design research guide, co-authored by fellow AWE panelist, Dr. Molly Bloom.

Takeaways

GenAI technology has the potential to increase XR content creation accessibility, especially via flexible, multimodal interactions (e.g., voice prompts to create 3D models and code). And as we’ve seen with the curb cut effect and Live Captions, accessible designs help everyone, reducing the overall barrier to entry to creating XR experiences. To realize this potential, we need to work with creators with disabilities, for instance, via artist partnerships and inclusive design research.

Learn more

The XR Accessibility Project: An initiative driven by the XR Association and XR Access to provide a central place for developers to find solutions for creating accessible code across various XR platforms.
15-part Voices of VR podcast series on XR Accessibility: Kent Bye shares interviews with XR accessibility researchers and practitioners from the 2023 XR Access Symposium. This includes interviews with fellow panelist Sean Dougherty, Director at LightHouse for the Blind and Visually Impaired, and panel moderator, Dylan Fox, Director of Operations at XR Access.
Blind in 2040: Specs on Deck: Design fiction of a day-in-the-life of a blind person in 2040, assisted by a pair of high-tech AR glasses called Specs. Written by Dylan Fox.

If you’re at AWE this year, remember to check out our panel, How Generative AI Can Make XR Creation More Accessible. Haven’t got your conference ticket yet? Use SPKR24D for a 20% discount.

Human-Computer Interaction News

This week, we cover three major AI and spatial announcements from Apple’s WWDC:

Apple Intelligence: Apple’s new AI system allows users to quickly summarize lectures, condense long group threads, minimize distractions with prioritized notifications, and rewrite text for tone and wording, including in third-party apps. Privacy is prioritized with on-device processing for simple requests and "Private Cloud Compute” for complex ones.
GenAI-enhanced Siri: Apple Intelligence will power a more contextually-aware UX in Siri. Onscreen awareness allows Siri to understand and take action with things on your screen. For instance, if a friend texts you their new address, you can say “Add this address to their contact card”. You can take action in and across apps with Siri, making a request like, “Send the email I drafted to April and Lilly”. Siri will know which email you’re referencing and which app it’s in. This interaction transcends an app-based model (i.e., opening multiple apps to accomplish a task) in favor of a more natural, streamlined interaction (e.g., use natural language and receive relevant output - no apps and taps involved). This capability sounds like a more functional version of the large action model proposed by Rabbit R1.
Vision Pro: The second generation of the Vision Pro operating system (vision OS2) was announced, enabling users to bring depth to an existing photo with machine learning. New capabilities include hand gestures that provide faster access to frequently used features, a larger, higher resolution Mac Virtual Display, and systemwide Live Captions. Developers can use new APIs and frameworks to more easily add immersive 3D objects to an app and quickly build a collaborative app experience (e.g., a board game). The headset will also now be available in new countries and regions starting June 28.

Is your team working on AR/VR solutions? Sendfull can help you test hypotheses to build useful, desirable experiences. Reach out at hello@sendfull.com

That’s a wrap 🌯 . More human-computer interaction news from Sendfull next week.