Mira Murati’s Thinking Machines unveils AI models designed for live human interaction

Thinking Machines Lab has unveiled "interaction models," a new category of multimodal AI designed for real-time communication. These systems process audio and visual input simultaneously, enabling continuous reaction and significantly reducing res...

By Dhruv Mohan, ET Online | May 12, 2026, 10.08 AM IST

Mira Murati, founder, Thinking Machines Lab

Thinking Machines Lab, the artificial intelligence startup founded by former Mira Murati, is attempting to solve one of the biggest frustrations with modern AI systems: the awkward pause between asking something and getting a response.

The company has unveiled a research preview of what it calls “interaction models,” a new category of multimodal AI systems designed for real-time communication. Unlike conventional AI models that wait for users to finish typing or speaking before generating a response, Thinking Machines is building systems that can listen, process, see and respond simultaneously.

That shift could fundamentally change how humans interact with AI.

Today’s AI systems still operate in a rigid turn-based format. Users provide a prompt, wait for processing, and then receive an answer. Over time, people have adapted themselves to this limitation by speaking to AI in carefully structured sentences, almost like writing emails or commands. Natural interruptions, pauses, acknowledgements and conversational cues rarely work well because existing systems are not designed to handle them in real time.

Thinking Machines argues that this becomes a major limitation if AI is expected to evolve into a genuine collaborator in environments where timing matters, including healthcare, industrial operations and customer support.

To address this, the company has developed a new architecture based on what it describes as “full-duplex” interaction. Instead of processing conversations as one long alternating sequence, the system breaks communication into micro-turns of roughly 200 milliseconds. This allows the AI to continuously react to visual and auditory input, even while it is already speaking.

At the center of the system is TML-Interaction-Small, a 276-billion parameter mixture-of-experts model focused on fast conversational handling, presence and immediate responses. Alongside it is a secondary asynchronous “background” model responsible for more computationally intensive tasks such as reasoning, tool usage and web searches.

The idea is that while one model keeps the interaction flowing naturally, the other works quietly in parallel and feeds deeper insights back into the conversation when needed.

Thinking Machines says the architecture also avoids the heavy external encoders typically used for audio and video understanding. Instead, it uses what the company calls “encoder-free early fusion,” allowing raw audio and visual signals to be processed directly through lightweight embedding layers within the transformer itself. The result, according to the startup, is significantly lower latency.

On FD-bench, a benchmark focused on interaction quality and conversational timing, the company claims TML-Interaction-Small achieved response latency below 0.4 seconds. For comparison, Google’s Gemini-3.1-flash-live reportedly scored 0.57 seconds, while GPT-realtime-2.0 came in at 1.18 seconds.

While faster responses may improve consumer-facing chatbots, the bigger implications likely sit within enterprise and industrial use cases.

A real-time interaction model capable of continuously monitoring video feeds and reacting instantly could prove useful in laboratories, manufacturing environments and safety-critical operations. Instead of waiting for human intervention, the AI could detect abnormalities or safety violations the moment they occur. In customer service, lower latency could make AI conversations feel substantially more natural and less transactional.

One of the more notable aspects of Thinking Machines’ approach is the model’s built-in sense of time awareness. This enables contextual instructions such as asking the system to notify a user if a process takes longer than a previous attempt, without manually specifying timestamps or measurements.

For now, Thinking Machines says the interaction models are being released only to a limited set of research partners. A broader public rollout is expected later this year.

Download
The Economic Times Business News App for the Latest News in Business, Sensex, Stock Market Updates & More.

Mira Murati’s Thinking Machines unveils AI models designed for live human interaction

Thinking Machines Lab has unveiled "interaction models," a new category of multimodal AI designed for real-time communication. These systems process audio and visual input simultaneously, enabling continuous reaction and significantly reducing res...

Related Articles

READ MORE:

More from our Partners

Popular Categories

Hot on Web

In Case you missed it

Top Searched Companies

Latest News

Download ET APP

Follow us on

become a member