Datacurve Raises $15M Series A led by Chemistry

Datacurve Raises $15M Series A led by Chemistry

Datacurve, a YC W24 startup that manufactures “expert-quality” coding data, has closed a $15 million Series A led by Mark Goldberg at Chemistry—a timely bet as the AI industry scrambles for better training and post-training data.  

Key Points

  • $15M Series A led by Chemistry; total funding now $17.7M.  
  • Focus: high-quality code datasets, private benchmarks, and agentic traces via a gamified contributor platform.  
  • Rising competition from data vendors like Mercor and Surge underscores demand—and scrutiny.  

Founded by Serena Ge and Charley Lee, Datacurve graduated from YC’s Winter 2024 batch. The company now lists total funding of $17.7 million on its site, following a $2.7 million seed round.

This round was led by Chemistry, the new $350 million early-stage firm co-founded by ex-Index partner Mark Goldberg, with participation including angels from top AI labs, according to the company. Chemistry’s mandate spans developer tools and infrastructure—making a code-data supplier a straightforward fit.  

Datacurve’s pitch is focused and unglamorous: build better code data and evaluations so models actually improve on real software tasks. Its site outlines supervised fine-tuning datasets, RLHF pipelines, repo-level RL environments, and a “Private Repo Taskbench” for enterprise codebases—plus telemetry “traces” of developer behavior collected through a custom IDE. In short, not just more data, but the right data.  

How does it get that data? A gamified, bounty-based platform called Shipd recruits skilled programmers to tackle algorithmic challenges, debugging tasks, UI/UX flows, and more. Datacurve says it draws on a pool of over 14,000 contributors and pays for outputs, not hours—an incentive design meant to reward quality and speed.  

It’s an aggressive swing in a market suddenly in flux. Meta recently invested in—and hired—the founder of Scale AI, long the category’s dominant supplier, signaling that Big Tech wants tighter control of data pipelines. That shift opens space for specialists like Datacurve that can deliver high-signal, domain-specific corpora—in this case, code.  

Meanwhile, rivals are scaling fast. Mercor is reportedly courting a multi-billion valuation on hefty run-rate revenue, and Surge remains a go-to for premium labeling—even as high-profile leaks have reminded buyers to scrutinize security and governance. For procurement teams, price is no longer the only line item; provenance, evaluation rigor, and IP hygiene matter as much.  

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

Let’s stay in touch. Get the latest AI news from Maginative in your inbox.

Subscribe