We often talk about the metaverse aspiring to be inclusive and bring people together. What better way to do that than to have a real-time, fluid conversation between two people who speak different languages—building new connections and communities?
Today, less than half the world population speaks the top 10 languages. There are over 7,000 languages spoken—with a real long tail. From the story of Babel to the current day, linguistic barriers have divided humans and prevented coordination across borders, at scale. The metaverse may soon be able to change that—connecting people who speak different languages, in different cultures, in different countries, with seamless communication. Among obvious benefits, the internet of tomorrow could keep endangered languages alive while facilitating real-time collaboration across the globe.
Effort is underway to turn this dream into a reality. In 2014, Shared Studios set up two immersive portals in galleries in NYC and Tehran, inviting strangers to come in and talk about daily life. People came out weeping and giddy. They described feeling as though they were “breathing the same air” and wanted to “reach and hug” the person through the screen. Of course, the experience still required an interpreter sitting with them. This has grown into a business model of bringing people using the real-time, constant immersion that comes with Web 3.0 in the metaverse. More recently, groups like Bloom began real-time simultaneous translation available in over 200 languages, with a goal of reaching all 7,400 languages by the end of 2024.
As the Bloom experience highlights, there's a real disconnect between languages spoken in the most powerful countries and languages spoken only in developing lands. There are, for instance, many languages that are widely used but not widely translated in specific languages from Africa. Bloom has already included many of these narrowly diffused and widely used languages in their models, but doing so requires time and resources. In the long run, tools like immersive studios and AI translation will pave the way to a world in which people can have conversations with each other, without knowing the other’s language.
Real-time translation is already making a difference in social connections. In our master's program in translation and interpreting, we already have students who train their own AI translation models. Increasingly, translators have to be able to work with machines on automatically generated solutions—whether the human is contributing to the front end, or double-checking that the translation is accurate on the back end. While machine translation is increasingly fluent, the path to 100% accuracy remains long and arduous. There are nuances that require human intervention to catch the mistakes and take the AI to the next level.
As we take part in these conversations, there is a tangibly different level of connection when there is no lag between the written translation and the words being said—the hand motion aligns with the words. We see people bond at a very different level.
The models right now are going from speech to captioning, to text translation, and to speech synthesis, which has a lag. The more that lag time can be cut down, the more engagement there will be. The captioning is working in real-time, and there is a real interest in going speech-to-speech and addressing the gaps and lags currently in place.
The tech field is invested in language and in translation at a level we have never seen before. It is on the forefront of their minds. Communication is critical to the metaverse building our communities.
Nevertheless, a world in which language is no longer a barrier does not sound as utopian as it would have a few years ago. The metaverse, often attacked as a dehumanizing force, in fact represents an opportunity to build new forms of human coordination—across time, space, and language.