Bridging the Gap from Simulation to Reality: Insights from the U.S. Taiwan High-Tech Forum

Bridging the Gap from Simulation to Reality: Insights from the U.S. Taiwan High-Tech Forum

I recently had the opportunity to immerse myself in another powerful conversation about the future of Physical AI, this time at the 2025 U.S. Taiwan High-Tech Forum. The event brought together a remarkable group of researchers, founders, and investors who are on the front lines of bringing intelligent robots into the physical world. The discussions were candid, insightful, and offered a clear-eyed view of both the incredible progress and the immense challenges that lie ahead.

Before diving into the takeaways, I want to extend a heartfelt thank you to the speakers and panelists who shared their expertise and vision:

  • Dr. Ed Chi (VP of Research, Google DeepMind)
  • Dr. Chelsea Finn (Co-Founder, Physical Intelligence & Assistant Professor, Stanford University)
  • Dr. Ashish Kapoor (Co-Founder & CEO, General Robotics)
  • Chi Chiu (Founder, CosmicBrain AI)
  • Shin Chen (Venture Investor, E14 Fund)

In the spirit of fostering open dialogue, and in keeping with the Chatham House Rule, the following synthesis captures the key ideas from the forum without attributing specific comments to individuals. My goal is to share the collective wisdom that emerged for the benefit of other founders and investors in the robotics community.


The Thesis for Generalist Models: A New Paradigm for Robotics

One of the most compelling themes of the day was the powerful argument for a new paradigm in robotics: the rise of the generalist model. For years, the conventional path to building a robotics company involved a vertically integrated, purpose-built approach. To automate a kitchen, you built a kitchen robotics company from the ground up. To automate a warehouse, you built a logistics robotics company. Each required custom hardware, custom software, and a full-stack effort, making it incredibly challenging and capital-intensive to bring robots into the real world.

The emerging thesis, however, is that a generalist foundation model—one that can power many different robots across many different tasks—will ultimately be both more effective and easier to use than a collection of specialized, purpose-built models. This mirrors the evolution we've witnessed in the world of digital AI, where large language models like GPT-4 provide a broad base of knowledge and capability that can be adapted for countless applications, from writing emails to tutoring students.

The critical question, of course, is how to build such a model for the physical world. The data problem in robotics is fundamentally different and more difficult than in language. While language models could be trained on the vast expanse of the internet, there is no equivalent repository of physical interaction data. The consensus from the forum was that while data from industrial automation, YouTube videos of humans, and simulation all offer scale, they each have critical limitations. Industrial data lacks diversity, the embodiment gap between humans and robots is significant, and the reality gap in simulation remains a major hurdle.

The bet, therefore, is on a strategy centered around large-scale, real-world robot data, collected primarily through teleoperation. This approach was brought to life through a fascinating case study on teaching a robot to do laundry.

The Laundry Challenge: A Case Study in Building a Generalist Model

The journey to creating a robot that could successfully unload, fold, and stack laundry was a powerful illustration of the challenges and breakthroughs in modern robotics. The process began with simple "unit tests"—folding a single, flat shirt on a table. But as the complexity increased to crumpled shirts and then a full basket of mixed items, the initial approach of training a model with imitation learning repeatedly failed. For nearly three months, the team struggled, exploring everything from adding memory to the model to decomposing the task into smaller steps.

The breakthrough came from adopting a playbook proven in the world of large language models: pre-training on a massive, diverse dataset of all available robot data, followed by fine-tuning on a smaller, curated set of high-quality human demonstrations. This two-stage process was the key that unlocked the robot's ability to perform the complex, long-horizon task. The initial success—folding five items in about 20 minutes—was a watershed moment.

From there, the progress accelerated. By integrating a larger, more powerful pre-trained vision-language model (Google's 3-billion parameter PaliGemma), the performance improved dramatically, cutting the task time by more than half and significantly increasing the consistency of the folds. This demonstrated a crucial insight: the development of the core foundation model (pre-training) can be decoupled from the development of specific skills (post-training on curated data). This allows for faster iteration and the ability to leverage advances in foundation models to improve a wide range of robotic skills.


From Intelligence to Skills: A Pragmatic Path to Value

While the vision of general-purpose intelligence is the long-term goal, a parallel theme at the forum was the pragmatic need to deliver value to customers today. This led to a critical distinction between intelligence and skills. As one speaker eloquently put it, "There are 8 billion people on this planet who are intelligent. Some are skillful to drive, and even fewer are skillful to fly an airplane."

For robotics companies seeking commercial traction, the immediate focus should be on developing general-purpose skills that are immediately useful. The example given was a robot arm that, out of the box, possesses the generalized skill of picking and placing any object. Such a system, which understands the affordances of different objects and how to grasp them, would provide immense value to manufacturers struggling to automate lines with thousands of unique parts.

This skills-based approach is a direct response to the needs of large industrial customers, such as automotive manufacturers. These companies are facing immense pressure to automate but are hampered by a fragmented market of piecemeal solutions. They don't have the time or resources to string together dozens of individual solutions. What they need is a sovereign robotics strategy—a unified platform that can deploy AI-powered skills onto their existing legacy robots and tackle a wide variety of tasks in their complex, often century-old factories. The companies that can deliver these valuable, general-purpose skills now are the ones best positioned to win these enterprise customers.


Tackling Fragmentation and the Simulation-to-Reality Gap

A recurring and urgent theme was the deep fragmentation of the physical AI space. "Nobody talks to each other right now," one founder noted. "Everybody's building their own stack." While this reflects a young and dynamic industry, it also creates massive inefficiencies and slows down collective progress. This presents a clear opportunity for platforms and initiatives that aim to connect the ecosystem, fostering collaboration and allowing developers to build on each other's work rather than reinventing the wheel.

This fragmentation is closely tied to the central challenge of the day: scaling from simulation to reality. The panel discussion made it clear that while simulation is a necessary tool, it is not a silver bullet. For tasks like navigation, simulation is largely a solved problem. However, for contact-rich manipulation tasks, the "reality gap" remains a formidable obstacle. The primary challenge isn't just physics; it's content generation. Creating the vast libraries of realistic, diverse, and high-fidelity simulated environments needed for training is an incredibly labor-intensive process.

The consensus was that the most effective path forward involves a hybrid approach, using simulation for what it's good at—safe testing of edge cases, generating large-scale but imperfect data—while relying on real-world interaction to ground the models and bridge the reality gap. The companies that master this interplay between sim and real will have a decisive advantage.


A Pragmatic Blueprint for Deployment and Safety

So, how do we deploy these increasingly capable but still imperfect systems into the real world safely and effectively? The forum offered a clear, multi-layered blueprint.

  1. Operational Pragmatism: Human-in-the-Loop. The first wave of mass deployment will not be fully autonomous. It will follow the model of airline pilots, with human operators monitoring the systems, ready to intervene when necessary. The path to autonomy is gradual, starting with a human-in-the-loop and progressively scaling to a model where one operator can oversee many robots as the systems prove their reliability.
  2. Engineering Rigor: Synthetic Data and Quality Control. On the engineering side, handling edge cases requires a robust process. This includes using synthetic data to create and test for rare scenarios that don't appear in real-world data. It also demands rigorous quality filtering of all training data—both real and synthetic—to prevent the model from learning from flawed or "hallucinated" examples.
  3. Research Innovation: Generating Code, Not Just Actions. At the research level, one of the most intriguing ideas was a shift in how we think about generating robot behavior. Instead of training models to output low-level motor commands directly, a promising direction is to have them generate robot code. Code is inherently more verifiable, testable, and predictable. We can reason about what a piece of code will do in a way that is much more difficult with the output of a neural network, offering a potential path to more reliable and trustworthy systems.

Final Thoughts: The Road Ahead is Real

My biggest takeaway from the U.S. Taiwan High-Tech Forum was the industry's palpable shift from the theoretical to the tangible. The conversations were less about abstract, far-future possibilities and more about the concrete challenges of deployment, data collection, and delivering customer value. The laundry-folding robot is no longer a research demo; it's a product of a rigorous engineering process that points the way toward commercial viability. The need for a "sovereign robotics strategy" is not an academic concept; it's a direct demand from the factory floors of the world's largest manufacturers.

We are still in the early innings of this transformation—as one investor noted, we may have six years to go before we even see the peak of this wave. But the path forward is becoming clearer. It is a path paved with massive amounts of real-world data, guided by the principles of building general-purpose skills, and built on a pragmatic foundation of human-in-the-loop deployment. The gap between simulation and reality remains, but we are building the bridges to cross it, one real-world interaction at a time.

I am immensely grateful to the organizers and speakers for a day of deep learning and candid conversation. The future of physical AI is being built by those who are willing to get their hands dirty in the messy, complex, and ultimately rewarding physical world.


This post reflects my personal synthesis of the conversations and does not represent the official views of any participant or organization.

Written by Bogdan Cristei and Manus AI



+ A a great post from Shin Chen:

Read more