Google's Genie Creates Playable Worlds from Sketches

Google's Genie Creates Playable Worlds from Sketches
đź‘‹ Hi, I am Mark. I am a strategic futurist and innovation keynote speaker. I advise governments and enterprises on emerging technologies such as AI or the metaverse. My subscribers receive a free weekly newsletter on cutting-edge technology.

The field of generative AI has seen tremendous advances in recent years, with models capable of generating remarkably realistic images, videos, and text. However, most of these models focus on passive generation from a prompt. In their new paper, researchers from DeepMind introduce an exciting new paradigm - generative interactive environments.

Imagine sketching a whimsical landscape on a napkin over lunch and, by evening, stepping into it as a playable 2D world. That's not a page from a sci-fi novel; it's the premise behind Google's latest AI marvel, Genie.

Unlike the magical beings of lore, this Genie doesn't grant three wishes but offers endless possibilities to creators, transforming mere images into interactive experiences. Trained on a vast trove of gameplay footage, Genie crafts worlds more aligned with classic platformers than VR, but its implications ripple far beyond gaming.

The model can take a text or image prompt and generate an entire playable, game-like environment. What's more, Genie is trained without any action labels or supervision, using only raw internet videos of people playing games. This allows it to learn in a completely unsupervised manner, opening up the possibility of internet-scale training.

Under the hood, Genie consists of three core components: a video tokenizer, a latent action model, and a dynamics model. The tokenizer compresses the raw video frames into discrete tokens. The latent action model then infers a discrete set of "actions" between frames, despite no ground truth being available. Finally, the dynamics model takes the frame tokens and latent actions as input, and predicts the next frame in an autoregressive manner.

A key innovation is Genie's use of a spatiotemoral transformer architecture. By limiting self-attention to spatial and temporal dimensions separately, Genie can efficiently model long video sequences. Experiments confirm that Genie scales well as more parameters and data are added, culminating in an 11 billion parameter model trained on over 200,000 hours of gaming videos.

The results are seriously impressive. Genie can take sketches, text descriptions, and even photorealistic images as prompts to generate interactive game worlds. The latent actions provide smooth control, moving characters and objects accordingly. One remarkable demonstration is Genie's ability to emulate parallax - foreground objects moving faster than distant background ones.

While limitations remain in consistency and speed, the authors argue that Genie opens up many exciting avenues for future work. It could be a general simulation engine for training reinforcement learning agents or robots. More broadly, by unlocking creative interactive experiences from any user's imagination, Genie points the way towards more humanistic generative AI.

As we marvel at Genie's potential to democratize game design, we might also ponder: How will this technology influence our perception of creativity and authorship? Are we edging closer to a world where our imaginations are the only limits, or will these tools reshape our very concept of creativity?

Read the full article on Tom's Guide.

----

đź’ˇ If you enjoyed this content, be sure to download my new app for a unique experience beyond your traditional newsletter.

This is one of many short posts I share daily on my app, and you can have real-time insights, recommendations and conversations with my digital twin via text, audio or video in 28 languages! Go to my PWA at app.thedigitalspeaker.com and sign up to take our connection to the next level! 🚀

upload in progress, 0

If you are interested in hiring me as your futurist and innovation speaker, feel free to complete the below form.

I agree with the Terms and Privacy Statement
Dr Mark van Rijmenam

Dr Mark van Rijmenam

Dr Mark van Rijmenam is The Digital Speaker. He is a leading strategic futurist who thinks about how technology changes organisations, society and the metaverse. Dr Van Rijmenam is an international innovation keynote speaker, 5x author and entrepreneur. He is the founder of Datafloq and the author of the book on the metaverse: Step into the Metaverse: How the Immersive Internet Will Unlock a Trillion-Dollar Social Economy, detailing what the metaverse is and how organizations and consumers can benefit from the immersive internet. His latest book is Future Visions, which was written in five days in collaboration with AI. Recently, he founded the Futurwise Institute, which focuses on elevating the world’s digital awareness.

Share