Skip to main content

Data rights are the new IP rights

Image Credit: GrandeDuc/Shutterstock

As more sophisticated resources for developers become widely available, copycat products can now be launched in a matter of hours. Software patents provided some limited protection, but feature wars rage on. Software without data is now a commodity.

These pressures make AI and the data that feed it more valuable than ever. There is no usable AI without data; AIs need data to train to minimum algorithmic performance (MAP) before they can demonstrate value to potential users and attract customers. New customers bring in more data, which is used to improve the algorithm’s performance, which attracts new customers, and so on. Each iteration of this feedback loop digs a deeper competitive moat.

Continued access to usable data is crucial to keeping this feedback loop moving. As a result, data rights have become the new IP rights. This presents opportunities and challenges for emerging startups.

Startups have a ‘clean start’ advantage

Customers were hesitant to entrust their data to an outside party at the dawn of the previous era of startups (the cloud era). Cloud era startups would explicitly forgo all rights to the customer data they managed in order to assuage these concerns. Many of those agreements are still in place today, hampering cloud era startups in their attempt to apply intelligence to their products. These cloud-era startups must now undergo the challenging conversation of re-negotiating data rights with their existing customer base, or go on an acquisition spree to get data.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


Startups in current era — the intelligence era — are approaching customers who are more comfortable letting third parties manage their data, enabling them to engage in a different conversation. Intelligence-era applications require high-touch data handling to effectively capture the relevant data, then clean, label, query, and analyze it to enable prediction and automation. Many enterprises that entrust storage of their data to cloud vendors remain wary of sharing deeper access to their data with those vendors, who may become potential competitors. They view startups as less of a competitive threat, so startups are better positioned to negotiate the rights to use this data.

Bootstrapping data to jumpstart the virtuous loop

Demonstrating value to gain leverage in negotiating data rights with early customers presents a chicken-and-egg problem for intelligence-era startups. For many applications, startups can jumpstart the virtuous loop by finding alternative sources of data to train the learning algorithm. Here are some possible approaches:

  • Target SMB and mid-market customers because they tend to have more liberal attitudes toward data rights, especially when data is exchanged for useful products at reduced prices. These smaller early customers can also serve as references for larger customers to see the value of contributing data to the training pool
  • Hire people to train the algorithms, either as full time employees or via mechanical turk
  • Find an external source of data such as publicly available datasets from government agencies, purchasing data from third party vendors such as Clearbit, or scraping relevant websites and social media
  • Provide a freemium version of the flagship product to capture user engagement data
  • Sell a desirable side product at cost in order to capture the data, a strategy Tesla has employed in order to build a massive dataset to train self-driving cars.

Many of these external data sources can be sufficient to train a learning algorithm to a high enough level of performance to demonstrate value and attract enterprise customers. It is imperative that intelligence-era startups build proprietary data pipelines in order to benefit from the compounding effects of pooled learnings across a customer network, making it difficult for new entrants and emerging copycats to catch up.

Strategies to structure data rights

While startups have an advantage over large incumbents in obtaining data rights, the negotiation is rarely easy. The following is an all too common story: A startup approaches a large enterprise with an incredible demo of a new, AI-powered workflow that promises to save the enterprise thousands of employee hours by automating a tedious and time-consuming task. The product ingests the company’s historic sales data, using it to qualify new sales leads and suggest the optimal time to call. The flashy demo blows the enterprise away and a limited, sandboxed pilot seals the deal. The enterprise is ready to buy and roll out the solution company-wide. Unfortunately, the discussions get stuck as the deal goes to the enterprise’s chief compliance officer and lawyers for review: There is no way the startup will be allowed to access their data, lest it fall into the hands of the competition. But the startup’s product is less valuable to the enterprise without the relevant data to train it.

Startups can get in front of these concerns by making it clear from the outset that their interest is in learning from data and the data exhaust (such as user engagement and interaction data, metadata, and data flow information) as opposed to aggregating the customer’s data to resell to third parties. The first data rights negotiations are the most difficult. Over time, as the pool of data grows, it becomes easier to demonstrate the value of the product and the network effects of fellow customers. Startups will gain more leverage in negotiating data rights after securing the initial wave of customers and their data.

A profound gamechanger

In the cloud era, companies competed by releasing new features, which are easy to copy. Consequently, absolute market dominance was harder to achieve, and second-place players exist in many categories. The virtuous loop in data accumulation presents an opportunity for companies to achieve “winner takes most” status, which has otherwise been limited to consumer categories to date. To achieve this kind of lead, a startup’s goals should be to obtain exclusive rights to data, accumulate customer data, and form partnerships.

Incumbents and upstart rivals can no longer outspend the market leader to close the gap after startups reach a critical mass of data. For the first time in history, technology companies have an opportunity to establish robust protection against legacy incumbents and emerging copycats, far beyond what traditional intellectual property strategies have been able to offer.

[A version of this story originally appeared on Zetta Venture Partners’ Medium blog.]

Mark Gorenberg is Managing Director of Zetta Venture Partners. He has 26 years of venture capital experience, funding and serving on the boards of numerous startups. Prior to his career in venture capital, he served as a software executive, entrepreneur, and a member of the first SparcStation team at Sun Microsystems.

Ivy Nguyen is an investor at Zetta Venture Partners. She was previously Senior Associate at NewGen Capital and managed the startup accelerator program at Imagine H2O.