Fusion of Realities: The Future of Human-AI Collaboration


Almost seven years ago, I contemplated the future interface for human-computer interaction. Rather than confining the possibilities to a single mode, such as voice or a combination of keyboard and screen, I pondered the integration of multiple inputs. Imagine typing text and speaking simultaneously, prompting the system to adjust its responses based on what it hears—or perhaps even sees.

Now, in 2024, with the advancement of multimodal large language models (LLMs), this concept has been revitalized. A particular focus on audio interfaces by Humane AI Pin has received mixed feedback, highlighting that audio might actually be one of the least efficient communication channels with computers—slow, unreliable, and potentially problematic regarding accents and speech patterns, raising concerns about legal issues.

Let’s speculate a bit further: What if a device could monitor and process your every action and every your word — of course except when you explicitly forbid it? Such a notion alarms many today, as both corporations and governments already know more about us than we might imagine. Increasing their knowledge could potentially backfire, leading to excessive control and the imposition of standards that align with their goals but not necessarily ours. 

But — imagine a society where, say, 60% of people embrace an “exoskeleton for the brain”—a system designed to enhance how they live and work. With greater comfort, less energy loss, and significantly increased efficiency, young people who’ve never experienced otherwise might find this integration indispensable. 

Picture a multimodal AI that observes everything you do, except when you disable it for privacy reasons. It analyzes collected data to tailor the world around you, offering personalized recommendations. This system becomes a true assistant, knowing almost everything about you: your movements, purchases, readings, diet, daily routines, your dog’s name, and even when your mother calls. 

The idea definitely makes many uncomfortable now, but future generations might find this invaluable. Consider viewing this not merely through the lens of your own experiences, biases, fears, and concerns, but through the eyes of those who, perhaps not by choice, have grown up with such an “exoskeleton” from birth. Their brains have intertwined with AI assistants to such an extent that they have, effectively, become superhumans. Their AI assistant had been evolving with them, re-processing tons of collected information to give even more power. This generation might view the integration of technology and daily life differently, seeing it as an integral part of their existence rather than a supplementary tool. For them, the boundary between human cognition and artificial intelligence is blurred, ushering in a new era of enhanced capabilities and possibly a new definition of humanity itself.

It does sound frightening, and I completely agree with you. However, to them, we are the perpetually grumbling generation, inventing problems where none exist.

Technologically, all of this is nearly achievable. It’s still costly, but firstly, Rome wasn’t built in a day, and secondly, the cost of computing resources has not only been decreasing very rapidly over the past century but is accelerating significantly.

I give you an elementary example of how it could work.

Consider a simple scenario: One morning you open the refrigerator to find that you’re out of eggs. In the future, an intelligent system, having observed you countless times, would automatically add eggs to your shopping list, recognizing your weekly purchase pattern.

Today, such a system seems far-fetched due to the vast computing resources required to analyze video streams. However, even now, LLMs can interpret auditory cues, like a reminder to buy eggs, and adjust shopping lists accordingly. All you need to do is whisper when you open the fridge door, “Oh, no eggs!” It is sufficient for an AI to make a note and come back to you when you are in the supermarket to remind about the eggs.

In the years to come, if data collection from video cameras, wearables, and other devices continues, these systems could eventually know everything about us. As of today, the value of extracting knowledge from petabytes of data might not justify its cost to individuals, governments, or corporations yet, but the trend is clear—within the next 10 to 15 years, computational capabilities will fundamentally change, making the knowledge derived from audio and video streams much more substantial.

Many people will oppose the intrusion into their private lives, the collection of data that should concern no one but themselves. However, it is likely that these individuals will find themselves at a competitive disadvantage compared to those who have agreed to and embraced these new rules of the game. Or perhaps, with those who have never seen a different way of life, who grew up from infancy with such super abilities because their parents chose it for them. Those who have been able to develop special talents by living the first 10-20 years of their lives alongside AI.

People equipped with such an informational AI exoskeleton could achieve unparalleled efficiency and comfort. Their personal voice assistants would understand them with half a word, trained on their lives 24/7.

Whether we should fear or embrace this future remains an open question, but one thing seems likely: we might not have a say in it.

Comments are closed, but trackbacks and pingbacks are open.