About Data Boutique
Data Boutique is a web-scraped data marketplace.
If you’re looking for web data, there is a high chance someone is already collecting it. Data Boutique makes it easier to buy web data from them.
Join our Platform to learn and interact about this project:
Not everything can be planned in advance
Web data projects involve multiple parties: Those who collect data, those who transform it into something usable, and those that finally use it.
More people means more complexity, planning, and longer time-to-market of the project.
This becomes an issue in case of sudden, urgent requests.
When the time-to-market of a project exceeds the time available to solve of the problem that caused it, web data becomes inefficient, even when, in theory, effective.
We created Data Boutique precisely to address this: Make every web-data project economically and timely viable, and unlock new value opportunities in Internet-gathered information.
The issues with planning
Planning is complex because we don’t have all the elements: Knowing what data we need and when we need it is not always easy to anticipate.
Sometimes we have too little advance notice, and sometimes the problem requires data from websites we didn’t think of.
Even if web data could help, it is too slow to implement (in this post, a note on why it may also be expensive).
Here’s a general view of this gap between information needs and web data project current applications.
What data to collect: Pre-determined or Variable
When the project has a long-term goal, it’s easier to anticipate: Price collection for dynamic pricing, job post collection for insights, and market alert notifications. What we call structural market monitoring.
In fact, these are the most frequent use cases for web-scraped data.
But information needs span way beyond this: Custom research, advisory projects, due diligence, or investment thesis evaluation. These use cases vary every time and involve different target companies, often in various industries.
When to collect: Frequent or Episodic
Another risk factor in web data projects is knowing how often we will use this data. The more frequent and the farther in the future we can see, the more we have room to plan. The more the use case is isolated or sporadic, the less we will be able to anticipate it.
Let’s say you are building a price comparison tool for your website, which will run daily as a permanent feature for the years to come. If it takes two months to develop and deploy, it can fit your purpose.
But if you need to compare prices for the Black Friday week, as the week is in progress, because you are losing sales, you wouldn’t have the luxury of this timeframe.
How does Data Boutique address this
Data Boutique is set to address two problems: We have unknown websites to crawl and an unknown level of urgency to deliver. Here’s how we managed them.
Solving data variance (the What)
Having the complete coverage of the known web-scrapable data structured, cleaned, checked, and ready to go is impossible. So we rule this out.
But there’s something else we can do: We set incentives to have the highest coverage for most requested websites. This way, we maximize the chance to meet the next new incoming requests for data.
This is more than just shooting for probability: Covering the most interesting/requested websites in an industry not only improves the likelihood of finding someone out there that was looking exactly for that website, but it also increases the likelihood that that particular website (since is the most requested) is very relevant for that industry. So even when we are not listing the exact website that somebody had in mind when entering Data Boutique, they still can find value in what’s currently listed.
Let’s use an example:
Let’s say a business analyst is writing commercial due diligence for an M&A for a European fashion brand. If the brand is small, it will be unlikely that Data Boutique has the official brand’s website in its catalog. The business analyst could request it, but she’s in a hurry and needs to deliver by tomorrow. But she sure can find something: the most relevant multi-brand retailers in the industry like Boohoo, Asos, or Zalando. That is not what she was looking for in the first place, but it is really close. And, most of all, it can fit her “I need it yesterday” schedule.
Solving the immediacy (the When)
There are not so many options that solve the fact that the user may need the data with zero warning: Data must be ready and pre-scraped on the platform anytime.
Immediate data availability: Data is always ready, three clicks away, for any customer to get.
The way we see it, it’s the only way we can serve all those 3AM last-minute issues that can be served with web data but cannot wait until tomorrow.
This implies an additional effort both on the seller side and on the platform side: Data needs to be constantly refreshed, checked, and ready to use regardless if someone is buying or not.
But there are advantages as well: Having a continuous stream of data allows superior quality assurance and higher trust from the final user.
Join the Project
Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can browse the current catalog and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.
More on this project can be found on our Discord channels.
Thanks for reading and sharing this.