The Hard Economics of Selling Web Data
Market components to consider when selling pre-scraped datasets
About Data Boutique
Data Boutique is a web-scraped data marketplace. If you’re looking for web data, there is a high chance someone is already collecting it. Data Boutique makes it easy and safe to buy data from them.
The Hard Economics of Selling Data
As already seen, it would make enormous sense to buy pre-scraped data instead of building a new code from scratch. Yet, many efforts that have been made in the past to sell datasets didn’t catch up.
Why is that? Why do companies hire or commission external consultants to scrape rather than search for pre-scraped data? Why is it build preferred to buy?
Selling data independently can be hard, as unit economics pull against it. But things look differently when we understand the market.
Web scraping is a commodity
We talked about this here: Since it would be feasible for me to hire someone to code a scrape for a fair share of websites, web scraping can be considered a commodity.
In other words, buyers have alternatives, and pricing datasets right might not be that easy.
The price trap
Price options are constrained, in fact. We can’t ask for too much, as few would buy (they have too many alternatives), but we can’t drop the price too much to sell it to more people, because it would be anti-economical (it costs too much for us to reach that customer, and cheap products might cannibalize existing ones).
Not Worth Buying: The Cost of Alternatives
Some data are unique, and others are a commodity. The commodity, by definition, has trouble in being priced too high. Every buyer will in fact consider the cost of alternatives.
This cost varies from buyer to buyer, but we can consider it the lowest of the following:
Cost and time of learning web scraping internally
Cost and time of hiring someone who can web scrape
Cost and time of commissioning the web scraping to a third party
Loss/ damage for not pursuing web scraping at all
This caps the maximum price a data seller can ask, as there is so little liquidity (so few buyers) at high prices.
Not worth Selling: Customer Acquisition Cost (CAC)
Since data can’t be sold for a lot, trying the VOLUME strategy, (low prices for many) seems the natural alternative.
However, sellers are not incentivized to pursue it, as they’d be operating at a loss due to high Customer Acquisition Costs (CAC): The distance a buyer will have to travel to find a pre-scraped dataset is longer than the distance to find a web-scraping expert who can do the work.
A stalemate. Where everybody loses, as the information market, which could trigger so many use cases, can’t find its way out.
How Marketplaces change this
The good news is, if there wasn’t a way out of this, humanity would still be stuck with in-house agriculture, as this applies to any generally available technology.
The model that humanity found to solve this stalemate is the marketplace. The - once physical now digital - place where buyers go and meet multiple sellers to do more, diverse purchases.
Trading goods and services in a structured marketplace has two major effects that allow the market to get out of the price trap and access the benefits of pre-scraped data:
1. Lowering the transaction cost
As stated by
in his brilliant essay on marketplaces, marketplaces' purpose is to lower the buyer's effort (transaction cost).When considering web scraped data, the buyer’s effort can be broken down into:
Search: The effort it takes to find a reliable data provider. Among all the features a data marketplace covers, this is the bare minimum: Providing a decent UI to find data providers.
Auditing: The effort (time and cost) it takes to test the result and ensure we can use it. Since web data has legal and quality implications of its own, a dedicated marketplace for web data makes sense. We have been working a lot on this point at Data Boutique, as auditing (quality and legitimacy) is often a deal-breaker.
Negotiation: The effort it takes to understand and negotiate the conditions for the sale. Another big one. When a buyer can seamlessly test, buy, and refresh data from multiple vendors under the same roof of Terms and Conditions, it’s like removing roadblocks with a bulldozer. You realize an industry has a price issue when prices are never displayed upfront. We have always pursued transparency, as it only speeds up transactions at the end of the day.
Price paid: The actual price paid. We don’t believe the price is too high for web data. We believe the pricing model, inherited from consulting, is broken. We firmly believe consumption-based pricing is the best way to align buyer’s and seller’s interests.
Execution: The effort it takes to actually ingest the data and make something useful out of it. Almost every data marketplace handles this, mainly because the interest of captive marketplaces is to use the other services, not the marketplace itself (Snowflake, Databricks, Tableau, Qlik, etc.)
When properly orchestrated, the buyer can have many advantages for transacting over a marketplace, and the transaction cost decreases (not necessarily the price).
2. Lowering the CAC for sellers
Sellers have upsides here, too. Since a buyer is already there, the cost to reach out to them is way lower than fetching them from far.
This is more true in some markets than others. The goods or services need to be a commodity (many sellers), with a fragmented buy-side.
The effect on the CAC is very relevant. Sellers can compete on value and price, attracting more buyers and lowering CAC.
The battle for leaner prices also has huge advantages for sellers: The reduction in price is more than counterbalanced by the growth in the Total Addressable Market (TAM) created.
We will discuss more on the benefits of marketplaces in future posts, as there are many facets to understand that make this model specifically adapt to solve the data distribution problem.
About the Project
That was it for this week!
Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can browse the current catalog and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.
More on this project can be found on our Discord channels.
Thanks for reading and sharing this.