About Data Boutique
Data Boutique is a data marketplace focused on web scraping. We bring together those who collect data with those who know how to use it.
What’s New on Data Boutique
PREFACE TO THIS EDITION
This edition opens a new series of posts on relevant content shared on the web-scraped data exchange during the past month.
Table of Contents
1.For data buyers
Case Study: How to match products on Farfetch
New Datasets: Cosmetics on Sephora
The Cost of Data: Asked price stats by schema
2.For data sellers
Feature: Your Profile Pages on Data Boutique
Tips: Automating Upload and Validations on the Data Boutique S3 Bucket
1. For data buyers
Case Study: How to match products on Farfetch
Farfetch is arguably the most relevant marketplace in the online fashion and luxury space, and price and promotion monitoring on this website is highly relevant.
In price monitoring, matching products across retailers is required to measure potential price misalignments between platforms. Multibrand retailers typically use internal product codes, as the apparel industry rarely uses standard codes, such as EAN or GTIN, making the job harder.
Farfetch exposes an additional code, the “ID Brand” code, the closest thing we have to a unique product identifier (SKU). We have a dedicated schema to capture any additional code e-commerce websites may have: The E-ADD-CODE-0001 schema.
Why use the ID Brand code
It makes price comparison with other websites easy, especially official brand websites. You can use it to match Farfetch and other websites (that you can find on Data Boutique), or with additional product list files you may have.
How to use the E-ADD-CODE-0001 schema
The E-ADD-CODE-0001 schema is essentially a lookup/cross-reference table: It contains the list of product codes used in other schemas, such as E0001, and associates it with the “ID Brand” code.
In this example, it associates the product_code 20267286 with the ID brand MA10554G0HMJ5.
The importance of having a separate schema
Why a separate schema? Two reasons:
Cost: ID Brand codes are visible in Product Detail Pages (PDP), which are more costly to scrape. By keeping it separate, we keep the price information in a cost-efficient schema. The E-ADD-CODE-0001 schema can be accessed in combination with an E0001 schema (plain data on prices), the first just once every month or less, the latter more frequently, dropping the cost factor roughly by an order of magnitude.
Easier to find, easier to request: With a separate schema it’s easier to spot when a website has this information available, with no need to investigate the sample file. A buyer will have immediate visibility, and when missing, they can request an E-ADD-CODE-0001 schema for that website; no further specification is needed. Data discovery gets a lot smoother this way.
New Datasets: Cosmetics on Sephora
The cosmetics industry is a hot sector for web scraping. The first dataset appearing on Data Boutique refers to the retailer Sephora, the leading beauty and cosmetics retailer owned by LVMH. Sephora offers beauty, skincare, and fragrance products from various luxury brands.
The dataset currently lists product prices for France and Poland and is offered by WebDataWatch, one of the most reliable data sellers on Data Boutique.
The Cost of Data: Asked price stats by schema
The data price on Data Boutique is set by buyers and sellers simultaneously. The factors that influence price are:
Cost of extraction
Demand volume and stability
Reputation of seller
Since the cost of extraction varies significantly based on the website and the depth of scraping required, it is helpful to look at price statistics by schema.
Here is a snapshot of what prices look like today on Data Boutique.
2. For data sellers
Feature: Your Profile Pages on Data Boutique
Starting this month, sellers can have a profile page hosted on Data Boutique, where all datasets offered are grouped together.
We want data providers to be able to showcase their work and attract leads. This can translate into job opportunities inside and outside of Data Boutique.
Profile Pages have the purpose of:
Establish the seller’s brand
Showcase their work
Enable lead generation
Have a space to share relevant news, marketing material, and updates.
How to activate the profile page
Since there is a lot of activity on the platform, these indications may change in the future (so just be aware of this if you are reading this post in the future):
Once logged-in (sign-up is free), from the top menu bar pick “Selling” and then “My Company Profile”.
From here you can handle your public messages, you can edit the information displayed on the Profile Page, and most importantly you can change the settings to:
Make your Profile Page public
Accept direct messages from any Data Boutique user
You are free to turn on/off any of these options.
Tips: Automating Upload and Validations on the Data Boutique S3 Bucket
When selling data on Data Boutique, data providers need to upload their files on an S3 bucket. An automatic validator tests the content and returns an approved/refused response.
A huge shoutout goes to Riccardo Lunardi, a seller from Italy, who shared some Python code to automate this process. Riccardo created a piece of code (class) that can update files to the contacts and check for their approval (check the article to know how). Thanks a lot, Riccardo🤗.
About the Project
Data Boutique aims to increase web data adoption, by creating a win-win environment for data sellers and buyers. Join our community (its free!) to be part of it. More can be found on our Discord channels.
That was it for this month. Thanks for reading and helping our community grow.