Human Archive: The Common Crawl for Embodied Intelligence

Physical AI and robotics will be the largest market of the next two decades. Morgan Stanley projects a $5 trillion humanoid market by 2050, and even that understates the prize: the world spends roughly $50 trillion a year on labor, the overwhelming majority of it physical. Any technology that captures even a sliver of that pool is the largest market in tech.

The path there is gated by one thing: data. Robotics is replaying the LLM playbook, and the bottleneck at this stage is identical to the one LLMs faced a decade ago: an enormous, well-aligned, multimodal pre-training corpus. NVIDIA’s head of embodied AI research, Jim Fan, calls this the “great parallel.” Teleoperation, the dominant data strategy of the last three years, is capped at 24 hours per robot per day and on its way out. The future is large-scale, sensor-rich, egocentric human data. Whoever builds the foundational corpus owns one of the most strategically valuable positions in AI.

Enter Human Archive.

Human Archive is building that corpus: the foundational multimodal dataset for human sensorimotor intelligence, or, as the team calls it, the Common Crawl for embodied AI. Custom wearable rigs capture aligned egocentric RGB, audio, stereo depth, tactile force feedback, motion capture, wrist cameras, and post-training data, which then moves through internal QA, anonymization, annotation, and benchmarking pipelines before shipping to frontier robotics labs and humanoid foundation model teams.

Human Archive was the most requested startup coming out of YC W26 Demo Day, and the reason is the team. The four co-founders are young, brilliant, and relentless. They have known each other for over 20 years, and each owns a distinct surface area of the company: Raj Patel (CEO) on customers, Shloke Patel on hardware, Samay Maini (CTO) on models, and Rushil Agarwal on operations, running the distributed collection network across homes, factories, and warehouses in the U.S., India, and beyond. They dropped out of Stanford and Berkeley during what was supposed to be finals week to do this full time. In their first months, the team has fanned out across San Francisco, Germany, China, and India, deploying hardware and signing data partnerships at a pace most companies do not reach in two years.

We are thrilled to lead Human Archive’s $8.2 million seed round. At Wing, we have spent years backing the data and AI infrastructure that has underpinned every major shift in AI: Snowflake, Pinecone, Voyage AI, Deepgram to name a few. The same pattern is now playing out for physical intelligence, and Human Archive is the team most likely to own its data layer. To borrow a line from Jim Fan, this generation may be born just in time to solve robotics.

If you want to help build the dataset that teaches the next generation of robots how to move in the world, check out Human Archive’s careers page.

Read Full Article
Zach DeWitt
Author
No items found.
Wing Logo in blue, all lower case letters.
Thanks for signing up!
Form error, try again.