About Data Boutique
Data Boutique is a web-scraped data marketplace.
If you’re looking for web data, there is a high chance someone is already collecting it. Data Boutique makes it easier to buy web data from them.
Join our Platform to learn and interact about this project:
Like-for-Like Price Differences
Today we are looking into a common use case for web data: Price Comparison analysis between retailers.
A product can be offered on more than one website and have different prices on each of them. This price gap plays a key role in consumers’ purchasing decisions, with obvious business implications for brands and retailers.
This post will see how to get the information to analyze like-for-like price differences, a so-called apples-to-apples comparison: The same product, offered in the same geography, at the same time, under the same conditions.
This is the starting point for many use cases: Dynamic pricing, selective distribution monitoring, market analysis, and more.
The 3 Input Data Needed
Whatever your purpose is, there are three elements your application needs as input:
Retailer’s price list: The full list of the retailer's products in the scope you want to analyze.
Reference price list: The full list of products with the price you want to compare it with. This can be another retailer’s price list, a direct-to-consumer website price list, or an internal catalog.
Link table: The two lists rarely have the same product codes; we’ll need a table to link them.
Once you have these inputs, you can build your application and deliver the results on any technology of choice, such as AI assistants, e-mail alerts, excel spreadsheets, BI visualization tools, and more.
.
Let’s look more in detail into the three input datasets.
1. Retailer’s Price List
This represents the prices you want to monitor, which the “target” retailer is exposing.
How to get this data:
Data Boutique: This is an ideal use case. Data Boutique provides web scraped data from many retailers with independently verified quality, completeness, granularity, and at a competitive cost factor. Just browse the catalog for the retailer you are looking into. You can use the “request” feature if a retailer is not listed.
Web scraping: You can do the web scraping yourself when the time, quality, and cost advantages of using Data Boutique are less pressing to you, or the data is not yet available on the catalog, and you can’t wait for it to be added.
Elements to pay attention to:
Geography: Prices change by geography due to shipping costs, taxes, or brand strategies. Ensure you collect the homogeneous geographies in retailer and reference price lists.
Currency: Global e-commerce websites often sell in multiple currencies, not necessarily the original currency for that country. Pay attention to the currency used. Here is an example of the fashion retailer mrporter.com.
On Data Boutique, local currency and EUR are displayed to save you the conversion hurdles.
Price conditions: Prices can be compared before or after discounts. Remember to do this “apples-to-apples”; generally, we advise separating full-price monitoring from discounts.
Product scope: Be aware of the retailer’s scope of products. Consider limiting the brands or website sections to your needs. Data Boutique's scope is always the entire website content, if not otherwise specified.
Product granularity: Are products available in all variants, or is only one available (i.e., color, configuration, size)? This will impact the number of comparisons you will be able to obtain.
Frequency of refresh: How often do you need to refresh the price list? According to your mission, this could go from monthly to even daily refreshes. Data Boutique provides a very friendly frequency adjustment feature, and you can change that as you go, allowing us to build a Proof of Concept (PoC) before switching on the refresh rate.
2. Reference Prices
The choice of the reference price is crucial, as only the overlap between the two lists can be compared.
Example: We want to compare a fashion brand's direct-to-consumer prices with a multibrand fashion retailer. The retailer’s price list is collected from the retailer, while IF the reference is the direct-to-consumer brand’s website, the overlap will be limited only to the products that were chosen by both retailers, reducing the overlap. If on the contrary you were to choose the internal catalog of the brand, the overlap would be much larger, as you would access the prices also for products that direct-to-consumer website is not selling.
Where to get it:
Internal catalog: You can access a brand's or retailer's internal catalog only if the brand/retailer grants access to it. This would be the preferred choice, but it has strong limitations (you would be unable to do so for the competitors).
Data Boutique: If the reference price list is a website, this can be found on Data Boutique, the fastest option. If not listed, you can request it.
Web scraping: You could obtain the retailer’s dataset from Data Boutique, saving time and web scraping expenses, and still collect the reference price directly. This mixed approach can be very effective because large retailers’ websites are often costly to scrape internally.
Pay attention to
Geography: It must be coherent with the retailer’s price list
Currency: It must be coherent with the retailer’s price list, or you must add a currency converter feature. Data Boutique pricing datasets help with that because EUR conversion is already provided.
Price conditions: Apples-to-apples. If the retailer’s price is without discounts and promotions, so must be our reference price.
Product scope: This is linked to the choice of the reference source.
Example: In the eyewear industry, it is frequent that the brand’s website only sells sunglasses, while eyewear retailers sell sunglasses and optical frames. If our reference price list is the brand’s website, the overlap would be limited to sunglasses only.
Product granularity: As discussed for the retailer’s price list, be aware of how variants are handled in your catalog/reference price list.
Frequency of refresh: It can be different than the retailer’s price list (internal product catalogs change more slowly than retailers’ websites), but a similar frequency update is recommended.
Overlap with Retailer’s Price List: Be aware of the expected overlap, and verify if this aligns with your project's mission. It would be unpleasant to discover that only a 5% overlap was found when the intention was to monitor 100% of the retailer’s products.
3. Link Table
We now have retailer and reference prices. Are we good to go?
No. Things get rough here: The two datasets rarely use the same product coding.
There is GTIN (Global Trade Item Number). Why not use it? You would expect the entire consumer goods retail sector is using barcodes… but this is rarely the e-commerce case.
We will enter this detail in a future post, but websites always use internal product codes. There are some exceptions, where hidden in the HTML code, you can find other reference codes, such as SKU, ASIN, and GTIN/EAN.
So how do we get the link between the retailer’s product code and the reference product code?
A) The HTML has a common-linking product code
Data Boutique: If the information (SKU, ASIN, and GTIN/EAN) is written somewhere on the product page, Data Boutique data (E0002 and E0003 so far) include all additional codes visible on the page or within the page's HTML code.
Web scraping: If this information is available somewhere but not yet present in the catalog, it is an option (although requesting it on Data Boutique is still preferred).
B) There is no common-linking product code in HTML
Algorithmic generated: If there are features for which AI or other algorithms can be used to identify the same object in two different lists (image, title, description), this is the best shot you have. But… only in theory: Real-world use cases still have frequent recognition errors, making them unfit for most purposes.
Manually generated: Although it may sound shocking, in many industries, volumes are too limited to properly train AI, images are not enough, or product titles are too generic… Someone has to do this manually, or you can find a provider that sells this mapping. It is part of data enrichment.
Things to pay attention to:
Product code transformations: Even if you find the same product code, it happens disarmingly very often that they are formatted differently. A brief code analysis can help you find the right transformation (see an example here of Jimmy Choo's official website compared with the same product on Blickers.com).
One-to-many mapping opportunity: Depending on your product category, you may increase the overlap of the two price lists by mapping the same variant (if this is not price sensitive) of the reference list to multiple variants of the retailer’s list (this proves effective also when variants are not available in one of the two lists, increasing overlap).
Example
Now you have all the necessary input data and can build your application. Let’s see an example to recap: Measuring price differences between Italian fashion brand Pinko and multi-brand retailer Farfetch:
Retailer’s price list
Pinko’s product prices on Farfetch can be found in the Data Boutique’s data for Farfetch. We choose the USA, UK, and France as illustrative geographies.
Here are the links to Data Boutique’s files:
Reference Price List
Also, Pinko’s website information (including pricing) can be found on Data Boutique. Here’s the link to Data Boutique’s files:
Link Table
Luckily enough, on the Farfetch product detail page (PDP), we have a code that matches Pinko’s product code.
Here are the links to Pinko’s page for product 1H20ZUY7SPC03Q and to the Farfetch page (the links may expire when you read this email) to help you understand why we need link tables.
While Farfetch link table is not yet available on Data Boutique by the time of writing of this post, we are planning to ad it. Please contact us if you are interested in having a link table for any brand on Farfetch
Join the Project
That was it for this week!
Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can browse the current catalog and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.
More on this project can be found on our Discord channels.
Thanks for reading and sharing this.