ep. 8: The UX of immersion - what Varjo XR-4’s computerization of human sight means for spatial computing use cases
9 min read
On November 27, Varjo released their XR-4 Series, an ultra high-end, PC-based headset. They’re targeting enterprise and government use cases, such as automotive design, medical training and flight simulation. Mixed reality (MR) content displayed in the headset is “practically indistinguishable from natural sight”, due to the headset’s 4K LCD panel per eye, achieving 51 pixels per degree (PPD) in the center of the panel. This approaches the 60 PPD generally accepted as the threshold for retinal resolution.
A key selling point of the Varjo headset is a high level of immersion afforded by approaching the resolution of human sight.
This focus on matching human sight is in service of immersion. Indeed, Varjo claims to make the “highest-immersion virtual and mixed reality products for advanced VR users”. We all have some sense that immersion is an important aspect of spatial computing experiences, but why, exactly? Should building for immersion be our primary focus? If not, then which use cases are better served by Varjo’s XR-4, relative to Apple Vision Pro and Meta Quest 3? We’ll define immersion, compare the importance of immersion to other aspects of the spatial user experience, and examine use cases in which XR-4-level immersion is necessary versus ‘nice to have’.
What is immersion?
Given the presumed importance of immersion in spatial computing experiences, it can be surprisingly difficult to find a clear definition. My go-to definitions come from papers by leading researchers in MR/VR:
“A description of display technology that can be objectively assessed in terms of stimuli from reality, sensory modalities, a field of view and a display resolution.” - Slater and Wilbur (1997)
“A psychological state of being enveloped by and interacting with an environment that allows users a continuous stream of experiences.” - Witmer and Singer (1998)
The common theme of these definitions is that immersion is an aspect of a headset user’s experience - the feeling of being absorbed in a new environment, different from the physical world we inhabit without wearing a headset. In the case of VR, that new world is completely digital, whereas in MR, the new world is a blend of physical and digital objects. A headset’s display resolution is key to delivering this sense of immersion to users, supporting Varjo’s claim that its retinal resolution makes it the highest-immersion device currently on the market.
Immersion and presence: related but not the same
At this point, you might be wondering if immersion and presence, a term commonly used in spatial computing, are the same. They are different, but closely related.
Presence is the cognitive feeling of being in a given scenario. There are two measurable aspects that go into experiencing presence in a virtual environment:
place illusion (PI) - the illusion of being in a place despite knowing with full certainty that you are not in that place.
plausibility Illusion (Psi) - the illusion that what seems to be happening in-headset is really happening, despite being certain that you know it isn’t.
To concretize PI and Psi, let’s consider Richie’s Plank Experience: here, you walk out on a plank, 80 stories high. You know you’re not really 80 stories up (PI), and that you’re not really in danger (Psi). However, your fear of heights is very much real, and you might shorten your steps to not “fall off” the virtual plank. These responses are signs that presence has been achieved in the experience.
Coherence even over immersion (in most cases)
Immersion provides the container in which PI can occur. In the same way, coherence, another aspect of the virtual environment, enables Psi to occur. Coherence means that the mental model the user has built in the spatial computing environment is preserved throughout their in-headset experience.
The bar for coherence is high - an experience needs to maintain nearly “100% logic” in its construction, and our brains are hyper-sensitive to even slight coherence deviations. If the user feels something is not quite right in the new world consisting of physical and digital content (for example, poor tracking of a digital object such that it unexpectedly floats away from the user, or an uncanny valley avatar), presence quickly falls apart and takes time to recover from.
While much of this research has been conducted on VR experiences, I’ve seen the same principles apply to MR experiences. In my 2020 Augmented World Expo talk, Multisensory Perception in XR with collaborator Laura Herman, I explain the relationship between coherence and immersion, which I call “fidelity”, for building spatial computing experiences. Ideally, you should aim for both immersion and coherence in your user experience to maximize presence. However, if you have to pick one, focus on coherence. For example, you can still be highly present in a low-fidelity 8-bit video game, as long as the game maintains mental model consistency aka “coherence”. However, as we moved into 3D games into the early 2000s, we started seeing uncanny valley characters or glitches in the environment - breaks in coherence that take the user out of the experience. As content fidelity increases, so does the bar for building for coherence.
Screenshot, 2020 Augmented World Expo talk, Multisensory Perception in XR. In short, aim for both coherence and high-fidelity in your spatial computing user experience. However, if you have to choose, focusing on coherence will maximize your sense of presence relative to fidelity.
Where immersion shines
That said, immersion aka content fidelity can be more important than coherence if a user’s goal is to see accurate, detailed digital visualization. Varjo’s case studies prominently feature use cases where viewing digital content (e.g., to get a sense of scale) is the goal, such as architecture visualization. Here, retinal resolution is the priority, putting the X4 headset at an advantage over Meta Quest 3 or Apple Vision Pro.
Varjo’s cases also skew heavily towards training where users interact more with physical objects (e.g., a steering wheel in maritime training) than digital ones, integrated with an overlay of high-fidelity digital content. Here, the physical objects are, by nature, coherent with the physical world.
A person uses a Varjo headset for maritime training. Notice the physical steering wheel and screen that the user interacts with while in-headset.
However, lack of coherence between the physical object/input device and digital content can break coherence and negatively impact the user experience. This may matter less in enterprise or government use cases, where the user goal is to successfully complete training for a specific procedure. In contrast, consumer use cases like gaming (Meta Quest 3), productivity (Apple Vision Pro), and media consumption (both Quest 3 and Vision Pro) hinge on being absorbed in the content, interacting with digital objects, or both - here, coherence is primary and achieving retinal resolution is secondary.
So, how do we build for coherence?
Keep the assumption of unity, an important general principle in multisensory perception, in mind. That is, as information from different modalities (e.g., vision, hearing) share more properties, the brain is more likely to treat them as originating from a common object or source. For example, when we perceive sound to come from the same direction as the visual, the visual experience is deemed more credible by the perceiver.
Some ways the assumption of unity might manifest in your user experience design:
If you’re using both sounds and visuals in a spatial computing environment, make sure that the sound is paired with visuals to elicit maximum presence.
If you’re creating an MR environment with digital objects that behave like real-world objects, ensure physics match what you’d expect in the real world (e.g., a digital glass would fall to the ground and shatter).
If you’re creating an MR environment that followed different rules from the real world (e.g., floating objects), then ensure all the parts fit together with consistent interactions and aesthetic style.
Following these principles will increase the coherence of the virtual environment, helping meet the user’s perceptual expectations and increase engagement in the experience.
Are you building mixed reality experiences for Varjo? Sendfull would love to partner with you to help define and evaluate your user experience - reach out at hello@sendfull.com
Human Computer Interaction News
Sensory visual data is why babies learn faster than LLMs: Yann LeCun shared a LinkedIn post detailing how animals and humans quickly get very smart with much smaller amounts of training data than today’s AI systems. After comparing the total amount of visual data seen by a 2 year-old versus the amount of data used to train LLMs, LeCun concludes that there’s more to learn from video than from text because it is more redundant, telling you a lot about the structure of the world.
Intro to LLMs: This explainer video by Andrej Karpathy describes LLMs, where they’re headed, comparisons and analogies to present-day operating systems, and some of the security-related challenges of this new computing paradigm. It includes an interesting analogy based on Daniel Kahneman’s Thinking, Fast and Slow - namely System 1 (fast, instinctive, e.g., answering 2x2) that corresponds to current LLMs: words enter the sequence, and the system predicts the next word. This is contrasted with System 2 (slower, deliberative, e.g., answering 17x24), corresponding to LLMs becoming complex decision-making systems, where we convert time to increased output accuracy.
Pillow: A relaxing MR app designed for your bedroom ceiling: This app is created for use while laying down in bed. It’s an interesting approach to building retention, with the potential of people using Pillow as a before-bed ritual.
OcuLenz AR/XR headset announced to help people with advanced macular degeneration: An example of a targeted, practical use case for XR, the OcuLenz uses pixel manipulation software to process real-world images and recreate them as an AR display onto the user’s remaining good vision.
Reshaping the tree: rebuilding organizations for AI: Professor Ethan Mollick explains how AI is fundamentally shifting how work is done, and therefore requires us to rebuild organizations around this shift. He presents a possible future of what rebuilding this process would look like, and three principles for rebuilding organizations.
Where are all the ‘godmothers’ of AI? Women’s voices are not being heard: After Altman’s return as CEO, OpenAI’s newly established board of directors is now made up exclusively of white men. This means that the needs and concerns of women - not to mention people of color and other gender identities - are massively underrepresented in future AI product development and decision-making. David Polgar’s article points out that there are actually numerous women leaders in AI, but we aren’t listening to them. He shares numerous helpful resources, like a list of ten books from leading women in AI.
That’s a wrap 🌯 . More human-computer interaction news from Sendfull next week.