ep. 17: What gen AI tools like Sora mean for design practitioners

7 min read

Feb 21, 2024

OpenAI recently announced Sora, a new generative video model, which can output detailed, high-definition film clips up to a minute long based on short text descriptions. This development marks a significant advance in text-to-video gen AI technology, with applications across numerous industries - especially entertainment, marketing and education.

Sora is part of a larger, gen-AI-powered paradigm shift, where we’ve shortened the loop between user intent and output to the point that we’ve obviated much of the skill required to create a video (e.g., Sora), illustration (e.g., MidJourney) or summary (e.g., ChatGPT).

My students often ask me if they should still pursue a career in design given the exponential growth of gen AI tools. My answer is yes - a design practitioner’s role is evolving to become increasingly important in this AI-generated future. Today, we explore what this evolution might look like, and how it intersects with building human-centered AI solutions.

Redefining craft

Craft in design has traditionally referred to technical skill, attention to detail and commitment to producing a high-quality product for people to use. Gen AI tools like Sora stand to redefine what we mean by “technical skill” and “attention to detail”, while keeping a commitment to producing high-quality output.

Let’s say I want to generate a video clip for my next newsletter. Instead of capturing video and dusting off Premiere Pro, I write what I want to be in my clip, and moments later, have a video, thanks to Sora. No technical videography skills required.

Let's examine the implications of this example. Would I have hired a videographer for this use case? No. If a tool like Sora didn’t exist, I simply wouldn’t use video to express myself in this instance. The AI tool allows me to effectively tell a story on my own and explore the medium of video. If professional videography was part of my practice, being able to quickly generate a video could help me experiment and iterate, potentially informing how I would shoot footage - or, assuming output quality was sufficient, focus on orchestrating the narrative and details of the AI-generated content. Craft gets redefined, from hours spent capturing and editing, to bigger-picture orchestration.

Now, enabling everyone to create means we’ll have a lot of AI-generate content. Just last year, AI generated 150 years worth of images in less than 12 months. This stands to increase the value of high-quality, “tasteful” output. As Scott Belsky, Adobe’s Chief Strategy Officer, puts it: in the age of AI, taste will become more important than skills. Creative professionals will remain sought after as purveyors of taste, even if technical skills (e.g., proficiency in certain software or hardware) become less important. In tandem, we need protocols for transparently verifying content credentials, both so that creatives get recognition and consumers can rebuild trust in content. An important step happened this past week, where Google and Meta joined the Content Authenticity Initiative, a cross-industry coalition of about 2000 members, spanning chip makers, camera manufacturers, software and AI developers, and media organizations.

Systems over interactions

A consequence of redefining craft is refocusing from interaction design to broader systems design. While systems design is already an established discipline, it will only grow in importance.

In the short term, this will manifest by the design practitioner taking an increasingly active role in the development of AI models, rather than only focusing on the interface design that allows people to interact with models. Paz Perez, UX designer at Google, provides concrete examples of how this may manifest: the UX designer will help coordinate how the model is shaped as well as the interactions users need; the design researcher imagines how datasets reflect user intent, and aligns them with Personas; content designers create detailed guidelines to train models. In parallel, we’ll see machine learning engineers more closely integrated with design, resulting in collaborations that more effectively bridge technical capabilities and user needs. There is research demonstrating the increased effectiveness of this approach during AI solution ideation. It’s also a key aspect of my Problem-Solution Symbiosis Framework, which models how we can symbiotically connect problem and solution space to converge on a solution that solves for user needs.

In the long term, we’ll see the emergence of new design practitioner roles. Jorge Arango, information architect, author and educator, describes some of these potential roles, including the meta-designer, who “designs the thing that designs the thing”, pattern wrangler, who manages a system’s cohesiveness and utility, and the coherence generator, who aligns the organization’s vision and purpose with its actions and messaging. The common theme across these roles is that design plays an increasingly strategic role, where systems thinking becomes synonymous with craft.

Embracing embodiment

With the growing intertwinement of extended reality (XR) and AI, we’ll move from considering 2D systems to embodied systems, where we design experiences that are closer to how we move through the physical world than past precedents of screen design.

The interactions we see, for example in the Apple Vision Pro - eye gaze to make a selection, hand gestures to rotate an object in space - are part of a broader system, mapped closely to human perception and cognition. It’s part of a larger shift towards what Rony Abovitz (founder of Magic Leap, Sun and Thunder; co-founder of MAKO Surgical) has called “XR Infinity”, grounded in technology that bends to meet people in context, rather than the other way around. The mental model of the real world increasingly applies to designing AI-powered digital experiences, dictating interaction paradigms.

Takeaways

I’m optimistic about the future of design practitioners in a world with pervasive gen AI technology. As these tools evolve, they open new opportunities for more people to express themselves (dare I say, “democratize”), and elevates the role of the design practitioner to a highly strategic, systems thinker.

If you’re a design practitioner working on AI solutions, here are three things you can start doing today to adapt your craft for the gen AI future:

Learn about how foundation models work. I’m not suggesting becoming a machine learning engineer. Instead, I’m supporting learning about the fundamental process, so you can both think more deeply about where design can positively impact development, and to speak the same language as the people building these models. If you’re just starting your journey, check out this explainer from Stanford's Human-Centered AI group. For the next step, watch Andrej Karpathy’s 1-hour explainer on large-language models.
Start a conversation with a teammate building foundation models. Learn about their work, how the model was built, and explore ways you can partner. For an example of what partnership could look like, check out this research showing that when you engage both experts in people’s needs and technology capabilities, ideation output is both higher in user impact and technical feasibility.
Design to augment, not replace, human capabilities. While a deep understanding of people’s needs and mental models is core to any design process, we need to layer on additional safeguards when designing for AI, to ensure we’re positively expanding human capabilities. As you develop a new AI-powered solution, ask:
- What human challenges can we face better with your product than alone (or, than with existing products)?
- What capabilities does your product unlock for people - something they couldn’t do as well before using your product? Think cognitively, such as how your product impacts attention, decision making and memory.
- What capabilities does your product replace, and what does that mean for the user and the people around them?

In addition to evolving our skills as design practitioners, these actions can also help center AI technology development around people, while fostering interdisciplinary dialogue and remaining connected to technical capabilities.

Human-Computer Interaction News

Brilliant Labs’s Frame glasses serve as multimodal AI assistant: If you need more evidence of the deepening connection between XR and AI, look no further. These glasses feature an integrated multimodal AI assistant (“Noa”), with a gen AI system capable of running GPT4, Stability AI, and the Whisper AI model simultaneously. This technology allows the glasses to perform real-world visual processing, novel image generation, and real-time speech recognition and translation, allowing a user to learn more about their environment while remaining engaged in the world (rather than looking down at a phone screen).

Meta releases Video Joint Embedding Predictive Architecture (V-JEPA) model: This non-generative model learns by predicting missing or masked parts of a video in an abstract representation space. V-JEPA is different from a generative approach like Sora because it doesn’t fill in every missing pixel, instead forming a mental model of the world and predicting consequences, much like humans intuit.

Start-up uses AI to enhance STEM accessibility for people with visual impairments: Over 96% of the content available today is incompatible with various assistive technologies that people with disabilities use. The co-founders of I-STEM, Kartik Sawhney and Shakul Sonker, developed a deep learning computer vision model trained on a range of STEM data (e.g., academic papers, posters, presentations) to convert these materials into accessible formats. The model has an accuracy of 92% and can read complex documents.

Launching a new gen-AI-powered product? Sendfull can help you measure user value. Reach out at hello@sendfull.com

That’s a wrap 🌯 . More human-computer interaction news from Sendfull next week.