<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Boutique]]></title><description><![CDATA[Web-Scraped data marketplace]]></description><link>https://blog.databoutique.com</link><image><url>https://substackcdn.com/image/fetch/$s_!5i-6!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F01280021-8d9b-4d20-8d73-cf0d6d150ed2_1064x1064.png</url><title>Data Boutique</title><link>https://blog.databoutique.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 06 Apr 2026 06:57:44 GMT</lastBuildDate><atom:link href="https://blog.databoutique.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[DataBoutique.com]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[databoutique@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[databoutique@substack.com]]></itunes:email><itunes:name><![CDATA[Andrea Squatrito]]></itunes:name></itunes:owner><itunes:author><![CDATA[Andrea Squatrito]]></itunes:author><googleplay:owner><![CDATA[databoutique@substack.com]]></googleplay:owner><googleplay:email><![CDATA[databoutique@substack.com]]></googleplay:email><googleplay:author><![CDATA[Andrea Squatrito]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Using Historical Data: Basic Knowledge]]></title><description><![CDATA[Elements to consider when using historical data from web scraping]]></description><link>https://blog.databoutique.com/p/using-historical-data-basic-knowledge</link><guid isPermaLink="false">https://blog.databoutique.com/p/using-historical-data-basic-knowledge</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Thu, 27 Jun 2024 04:54:45 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Using Historical Data: Basic Knowledge</strong></h1><h4>Elements to consider when using historical data from web scraping</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="3999" height="2666" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2666,&quot;width&quot;:3999,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;white and black train toy&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="white and black train toy" title="white and black train toy" srcset="https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1598363432216-501262ed9cee?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHw5fHxoaXN0b3J5JTIwYm9va3xlbnwwfHx8fDE3MTkzMDUxMzB8MA&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Markus Winkler</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><h1>The Context</h1><p>DataBoutique.com <a href="https://blog.databoutique.com/p/historical-data-now-available-on">recently enabled historical data</a> access, a long awaited feature that unlocks value both for data buyers and sellers.</p><p>Simply put, it allows one-click access to past data, offering different options in terms of cost and granularity. And for the benefit of many, keeping the same data-schemas adopted on current data collections.</p><div><hr></div><h2>When to use historical data</h2><h3>See the past (to predict the future)</h3><p>The most common quesiton an historical dataset can answer, is &#8220;how were things in a specific point in time?&#8221; or &#8220;When did this thing start?&#8221;, and ideally capture trends that can help us have a clearer forecast for the future. &#8220;Is the price of a product growing?&#8221;, &#8220;Is the number of hotels in that reservation website rising?&#8221;</p><p>A variation on this is to simply check what happened on a particular date or who first did something. &#8220;Was this product cheaper on that day?&#8221; &#8220;Which website started the discounts first?&#8221; &#8220;Was this retailer selling this product also before that date?&#8221; And so on. </p><p>Websites like <a href="https://wayback-api.archive.org/">wayback-machine</a> are inadequate to provide proper time-tracking of fast-moving websites, like e-commerce or reservation platforms, getting web scraped historical data is the only way to answer these questions.  </p><h3>Back-test hypotheses</h3><p>We can take it one step further and look at the past to test hypotheses. &#8220;Is a heavy-discount practice on an e-commerce anticipatory of poor website performance?&#8221; &#8220;Was the competing retailer running discounts when my website suffered a decline in sales?&#8221; &#8220;Are those two brands synching their price-change strategies?&#8221;.</p><p>This approach is more complex than just &#8220;looking at the past,&#8221; as it often involves looking for a statistical correlation between two phenomena, higher granularity of the dataset, and longer timeframes for testing.</p><p>Longer timeframes imply that either you or someone in your organization has had enough foresight to start scraping that website two years before you needed it, or you find someone who is already (consistently and continuously) doing it (<em>that&#8217;s</em> what Data Boutique is for).</p><h3>Train A.I. (and other fun stuff)</h3><p>An even more advanced use case of historical data is to train automations: Once you&#8217;ve seen the past, once you have tested your hypotheses, you can train algorithms, AI, or other decision-making processes to take action when specific conditions happen. Stop discounting as soon as the competitors stop, dynamically adjust prices, automatically buy an item when price conditions are met, or raise a red flag when a distributor does something they are not supposed to.</p><p>Whatever you want to build, test it on the past, or (recommended) train AI to do that for you.</p><div><hr></div><h2>Things to consider</h2><p>Here&#8217;s a list of essential elements to remember when approaching historical data. The topic is more complex than this, but we&#8217;ll start with the founding blocks:</p><h3>History length</h3><p>Arguably, the most basic element to consider is when the collection started. Now, with traditional datasets (financial transactions, stock prices, air temperature, etc.), we are used to seeing very long time series dating back decades. </p><p>In web scraping, the situation is different, to say the least. With few notable exceptions (data providers targeting a single website for years), finding datasets with less than a year or just a few months of history is not infrequent. This is part of web scraping: It costs money to keep collecting data from a website, and there are simply too many websites to choose from. While scraping today&#8217;s website content can be considered a commodity (it doesn&#8217;t really depend on who is doing it, as long as the quality is met), historical data is a differentiating factor.   </p><h3>Frequency</h3><p>Historical data can be offered at different levels of granularity. One scan per month can be enough in some cases, while others require higher frequencies (weekly or daily).</p><p>The finer the granularity, the more information (and noise) can be found, but it is also heavier to manage. On Data Boutique, we offer three levels:</p><ul><li><p><strong>Monthly</strong>: One snapshot per month (12 a year) is ideal for simple research, long-term trend analysis, and initial exploration for finer work to be done on more granular frequencies (as it&#8217;s cheaper)</p></li><li><p><strong>Weekly</strong>: One snapshot per week (52 a year) is approximately four times larger than the monthly one. It&#8217;s quite bulky but perfectly suitable for most back-testing and training purposes. Given its higher cost, we recommend trying a cheaper run with the monthly one before evaluating it as an effective way to assess its potential.</p></li><li><p><strong>Daily</strong>: The most granular historical dataset we allow on the platform. This is for heavy-duty usage (like some revenue-estimating projects).</p></li></ul><h3>Quality Factors</h3><p>What quality elements impact time series? We name a few:</p><ul><li><p><strong>Completeness</strong>: Are data points missing? Are there significant gaps in the collection? A continuous collection is preferable, but as experience tells us, web scraping has quite a bumpy pipeline (we all confide in AI-aided scrapers to fix that). Gaps in the collection, unfortunately, do happen. Data Boutique provides a completeness indicator designed precisely for this. </p></li><li><p><strong>Point-in-time and gap-filling</strong>: Gap-filling is the technique of " filling the gap&#8221; in history by interpolating two data points. The opposite (leaving this as it is and not filling the gap or changing the content afterward) is point-in-time data. Data Boutique is committed to delivering data as close to the original format as possible, delivering point-in-time data. This ensures that buying historical data has the same result as buying it as it gets published.</p></li><li><p><strong>Date-picking method</strong>: When creating a monthly or weekly time series, data providers often choose one specific date for their collection (e.g., every first/last day of the month/week). On Data Boutique, these dates are homogeneous and defined as the last day of the month/week (or the closest available).</p></li><li><p><strong>Quality Assurance</strong>: Quality procedures (domain, completeness, consistency, and ground truth checks) on Data Boutique are the same as those applied to current data. Again, this is to provide a uniform service for buyers who get historical data vs. those who get the data over time as it gets published.</p></li></ul><div><hr></div><h2>Final remarks</h2><p>Using historical data can be enormously powerful but carries a heavier workload. The ones listed here are just some of the elements to consider. </p><p>As always, we encourage the community of developers and data providers to join the conversation on our channels (we have a friendly Discord server). We are happy that some data sellers activated this option (historical data is an opt-in feature), and that some users have already experimented with historical data purchases.</p><p>Historical data has also been included in data bundles (our<a href="https://blog.databoutique.com/p/the-hidden-costs-of-asking-for-quotations"> cost estimation tool</a> for large data packages), and we&#8217;ll soon add more features to play with.</p><p>That was all for this edition. </p><p>Thanks for reading,</p><p>Andrea</p><div><hr></div><h2><em>About Data Boutique</em></h2><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Historical data now available on DataBoutique.com]]></title><description><![CDATA[Web scraped historical data access]]></description><link>https://blog.databoutique.com/p/historical-data-now-available-on</link><guid isPermaLink="false">https://blog.databoutique.com/p/historical-data-now-available-on</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Mon, 03 Jun 2024 05:04:49 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Historical data now available on DataBoutique.com</strong></h1><h4>Web scraped historical data access</h4><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080" width="5184" height="3456" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3456,&quot;width&quot;:5184,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;a car that is sitting in the middle of a body of water&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="a car that is sitting in the middle of a body of water" title="a car that is sitting in the middle of a body of water" srcset="https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1647831518297-9ac55f61636b?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmFjayUyMHRvJTIwdGhlJTIwZnV0dXJlfGVufDB8fHx8MTcxNzM1MTI2NHww&amp;ixlib=rb-4.0.3&amp;q=80&amp;w=1080 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="true">Iv&#225;n D&#237;az</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><h1>In Brief</h1><p>You can now purchase historical data for web-scraped datasets on DataBoutique.com. This allows you to access data collected in the past and integrate it into your applications, whether for analytics, SaaS, or AI.</p><p>Highlights: </p><ul><li><p><strong>Consistent structure:</strong> Historical data maintains the same structure as current data, with identical fields, definitions, and quality controls. You can seamlessly add future data to the same structure and use the same ETLs.</p></li><li><p><strong>Flexible pricing</strong>: Choose from daily, weekly, or monthly details to adapt your budget to your specific use case.</p></li><li><p><strong>Easy testing</strong>: Free samples are available, allowing you to review the data before making a full purchase.</p></li></ul><div><hr></div><h2><strong>What is Historical Data</strong></h2><h3>Definition</h3><p>Historical data refers to web data collected by our vendors in the past. This allows data buyers to access the entire collection history of each vendor and enables vendors to further monetize their past efforts. </p><p>Historical data includes:</p><ul><li><p><strong>Removed Data</strong>: Information that is no longer available on a website, such as discontinued e-commerce products.</p></li><li><p><strong>Altered Data</strong>: Data that has been updated or changed, such as prices, locations, and the number of reviews.</p></li><li><p><strong>Seasonal Phenomena</strong>: Data on seasonal events like discounts or promotions valid only during specific times of the year.</p></li></ul><p>While the Internet doesn't retain past data, capturing it allows us to recreate and utilize it effectively.</p><h3>Benefits and Use Cases</h3><p>Access to historical data is particularly valuable for applications such as:</p><ul><li><p>Market trend analysis</p></li><li><p>Competitive benchmarking</p></li><li><p>Legal compliance</p></li><li><p>AI training </p></li></ul><p>In market analysis, historical data helps identify trends over time, such as seasonal changes and shifts in consumer behavior. This enables businesses to make better strategic decisions and predict future market conditions.</p><p>For market analytics, historical data allows companies to analyze past market dynamics and competitor actions, helping them optimize their strategies and stay competitive.</p><p>Overall, historical data is a powerful tool for improving AI, understanding market trends, and enhancing business decisions.</p><h2>Features of Historical Data</h2><h3>Structure</h3><p>Historical data has the exact same structure as current data, which means:</p><ul><li><p><strong>Schema Consistency</strong>: You can find the structure definition under the &#8220;Schema&#8221; section of the platform.</p></li><li><p><strong>Seamless Integration</strong>: Use the same ETLs to integrate it as you do with current data, with minimal changes required.</p></li><li><p><strong>Expandable Database</strong>: Append future data to the historical collection effortlessly, allowing your database to grow over time.</p></li></ul><h3>Pricing</h3><p>Price transparency is our cornerstone. Historical data pricing is structured to fit smoothly into your budget:</p><ul><li><p><strong>Clear Pricing</strong>: Prices are clearly stated for all options.</p></li><li><p><strong>Cost Breakdown</strong>: Prices are linked to the cost of extraction, the number of snapshots included, and the depth of history. Shorter histories are priced lower, while more granular details reflect in the overall cost.</p></li><li><p><strong>Flexible Packages</strong>: Choose from different packages based on the level of detail needed (daily, weekly, or monthly).</p></li></ul><h3>Testing</h3><p>Trying before buying is crucial for data purchases. Aside from free samples, which also apply to historical data, users interested in high-granularity details (like daily data) can test the collection's effectiveness with low-granularity data (like monthly data) at a fraction of the cost before committing to a larger purchase.</p><div><hr></div><h2>How to Access Historical Data</h2><h3>For data buyers</h3><p>When a data vendor opts to activate history, you will find the available historical data details listed on the dataset. You can purchase the historical data separately from the current data extraction by clicking the dedicated button below each historical data option.</p><p>For example, <a href="https://www.databoutique.com/buy-data-page-detail/cettire+dataset+by+re+analytics+srl/r/recKy6Enw62A3MtEv">The Cettire US pricing dataset</a>, currently trading at 9.00 EUR per collection (the full catalog of a single day of the e-commerce website)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!my3n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!my3n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 424w, https://substackcdn.com/image/fetch/$s_!my3n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 848w, https://substackcdn.com/image/fetch/$s_!my3n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 1272w, https://substackcdn.com/image/fetch/$s_!my3n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!my3n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png" width="928" height="323" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:323,&quot;width&quot;:928,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21556,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!my3n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 424w, https://substackcdn.com/image/fetch/$s_!my3n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 848w, https://substackcdn.com/image/fetch/$s_!my3n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 1272w, https://substackcdn.com/image/fetch/$s_!my3n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44a668d1-d7ac-4bcc-b05a-ba97aae0982c_928x323.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>has a history dating back 11 months (if you read this post in the future, you&#8217;ll find a longer history) for 99.00 EUR</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4A7U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4A7U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 424w, https://substackcdn.com/image/fetch/$s_!4A7U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 848w, https://substackcdn.com/image/fetch/$s_!4A7U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 1272w, https://substackcdn.com/image/fetch/$s_!4A7U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4A7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png" width="916" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:916,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25010,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4A7U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 424w, https://substackcdn.com/image/fetch/$s_!4A7U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 848w, https://substackcdn.com/image/fetch/$s_!4A7U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 1272w, https://substackcdn.com/image/fetch/$s_!4A7U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb914fef4-06ba-4406-98a0-fa0238d8c2e4_916x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A 50 weeks of historical data (weekly detail) at 360.00 EUR</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yomy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yomy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 424w, https://substackcdn.com/image/fetch/$s_!Yomy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 848w, https://substackcdn.com/image/fetch/$s_!Yomy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 1272w, https://substackcdn.com/image/fetch/$s_!Yomy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yomy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png" width="962" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:962,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yomy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 424w, https://substackcdn.com/image/fetch/$s_!Yomy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 848w, https://substackcdn.com/image/fetch/$s_!Yomy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 1272w, https://substackcdn.com/image/fetch/$s_!Yomy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F327870d0-262d-4027-8dfd-4cb7b486b26b_962x400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Or a daily history, dating back to June 15th, 2023, for 1.521.00 EUR</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8ugv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8ugv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 424w, https://substackcdn.com/image/fetch/$s_!8ugv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 848w, https://substackcdn.com/image/fetch/$s_!8ugv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 1272w, https://substackcdn.com/image/fetch/$s_!8ugv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8ugv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png" width="940" height="277" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:277,&quot;width&quot;:940,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18362,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8ugv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 424w, https://substackcdn.com/image/fetch/$s_!8ugv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 848w, https://substackcdn.com/image/fetch/$s_!8ugv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 1272w, https://substackcdn.com/image/fetch/$s_!8ugv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b1605f2-e375-4af8-a1c8-919f08ce257a_940x277.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hitting the &#8220;buy&#8221; button will give you access to the history file you&#8217;re looking for.</p><h3>For data sellers</h3><p>Activating history is an opt-in feature, and soon it will become a self-service option. If you want early access to sell the historical data you have already uploaded or wish to add historical data collected before joining DataBoutique, please contact us on Discord or simply reply to this email.</p><div><hr></div><h2>QA and Pricing of Historical Data</h2><h3>QA (Quality Assurance)</h3><p>The same QA processes used for current data are applied to historical data. Historical data is submitted by data vendors over time as current data. Once validated, it is stored and made available for bulk purchase if the vendor agrees. This includes:</p><ul><li><p><strong>Domain Validation</strong>: Datasets are linked to a schema, with each field validated against a domain (e.g., currency, country codes, prices). This ensures standardization and excludes unwanted content, such as personally identifiable information (PII).</p></li><li><p><strong>Website Validation</strong>: Before acceptance on DataBoutique, all websites are verified for public accessibility and compliance with click-wrap terms of service.</p></li><li><p><strong>Completeness Validation</strong>: Peer review verifications (independent checks on expected content) are continuously performed on submitted content.</p></li><li><p><strong>Time-Series Consistency Validation</strong>: The content (e.g., price trends, and item counts) is checked over its own history to ensure the stability of published data.</p></li></ul><h3>Pricing</h3><p>The cost of extracting data varies significantly from website to website, depending on the effort and expenses required. Pricing is set by data vendors, with historical data following the same principles.</p><p>Historical data is priced based on the number of snapshots contained in the selected historical detail. The more data points a history file has, the higher the price, reflecting the single extraction cost, including a discount factor depending on the time length.</p><div><hr></div><h2>Next on Our Roadmap</h2><p>Enabling historical data was a significant milestone on our roadmap. Next, we plan to:</p><ul><li><p><strong>Integrate Historical Data into Data Bundles</strong>: Make historical data purchases available as part of our "data bundles."</p></li><li><p><strong>Activate "Request to Activate History" Feature</strong>: Allow buyers to request the activation of historical data from sellers.</p></li><li><p><strong>Link Historical Data Frequency to Future Updates</strong>: Connect the frequency of historical data updates to future data updates for seamless integration.</p></li></ul><p>Stay tuned for coming releases, and spread the word, in case this email was forwarded to you.</p><div><hr></div><h2><em>About Data Boutique</em></h2><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[The Web Data Landscape Map: Visualizing The Ecosystem]]></title><description><![CDATA[Mapping actors and solutions]]></description><link>https://blog.databoutique.com/p/the-web-data-landscape-map-visualizing</link><guid isPermaLink="false">https://blog.databoutique.com/p/the-web-data-landscape-map-visualizing</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Thu, 18 Apr 2024 14:31:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!x08l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>The Web Data Landscape Map: Visualizing The Ecosystem</strong></h1><h4>Mapping actors and solutions</h4><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><h3><strong>Why a Map?</strong></h3><p>The &#8220;Map&#8221; is a global directory of all actors working with web data: from extracting to cleaning, trading to analyzing, and turning it into insights.</p><p>We needed a map because we trade with so many types of users that multiplying the connections between them is highly valuable for everyone, including us.</p><p>Drawing inspiration from similar initiatives like Crunchbase, we have opened our archives to the public for collective enrichment. Originating from our personal notepads, Notion, and Airtable documents, it is now a community-driven resource.</p><p>Web Data is a <strong>very fragmented ecosystem</strong>, which is thrilling, but dispersion can sometimes be a drawback for growth. Hence the initiative.</p><p>Want to be included? Read on.</p><h3>Who can be on the Map?</h3><h4>Actors</h4><p>The <strong>actors</strong> involved are companies, organizations, and freelancers (within the boundaries of privacy laws) who explicitly offer solutions related to web data. Regardless of size, geography, or sector, anyone who helps access, use, and make sense of web data. </p><p>Everyone can suggest a company to be included in the list.</p><h4>Solutions</h4><p>A <strong>solution</strong> is a service commonly offered to the market used for, with, or in relation to web data. The guideline is that they clearly relate to web data. </p><p>Mobile proxy, dynamic pricing, competitive intelligence, data marketplaces are valid solutions; enterprise ERP, CRM are not.</p><p>Solutions are <strong>specific</strong> (i.e. Hotel Dynamic Pricing, Residential Proxy) and don&#8217;t refer to specific providers or product names.</p><p>It is the solution that allows an actor to be listed on a map, even if a company covers various services, but one of them is in relation to web data (i.e. they do consult on how to integrate web data) they are allowed - as long as there is an explicit reference in their service portfolio.</p><p>Actors can be associated with more than one solution, and this can evolve over time. Any use can suggest a new solution and can suggest new or ceasing relationships between actors and solutions.</p><h3>Who can contribute</h3><p>Building an ecosystem map is a collective work. Just like the Crunchbase model, anyone can suggest entries. Contributions will be moderated, with the purpose of keeping high-quality in the map.</p><p>Actors and Solutions can be suggested by:</p><ol><li><p><strong>Registered Users</strong>: Any individual can sign up for a free account on Data Boutique and start contributing. Registered users can add profiles of companies, solutions, and individuals involved in the web data ecosystems.</p></li><li><p><strong>Company Representatives:</strong> Individuals affiliated with a company, such as founders, employees, or authorized representatives, can claim their company's profile on Data Boutique. By claiming a profile, they can update and manage the information presented about their company, ensuring it is accurate and up-to-date, as well as posting news and links to external material. </p></li></ol><h3>How to Consult the Map</h3><p>The Ecosystem map is public; anyone, registered or not, can access it.</p><p>The map is organized in Actors, the directory of companies, and Solutions, the directory of solutions. Both Directories are interrelated by actor-solution links.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/ecosystem&quot;,&quot;text&quot;:&quot;Go to Ecosystem Map&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/ecosystem"><span>Go to Ecosystem Map</span></a></p><div><hr></div><h3>How the Ecosystem Map works</h3><p>Our model operates under straightforward rules: To qualify for inclusion, actors must provide a verifiable service related to web data.</p><p>We categorize these services broadly, accommodating a diverse spectrum from scalable <strong>one-to-many</strong> services&#8212;like proxy network providers and scraping tools&#8212;to bespoke <strong>one-to-one</strong> solutions tailored for specific needs, such as custom data extraction or consultancy for business intelligence and analytics. Additionally, our map includes everything from basic, low-level <strong>raw data services</strong> (data feeds) to more complex, high-level offerings that provide <strong>insights</strong> (market intelligence) and everything in between.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x08l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x08l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 424w, https://substackcdn.com/image/fetch/$s_!x08l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 848w, https://substackcdn.com/image/fetch/$s_!x08l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 1272w, https://substackcdn.com/image/fetch/$s_!x08l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x08l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png" width="1456" height="1203" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1203,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:249824,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x08l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 424w, https://substackcdn.com/image/fetch/$s_!x08l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 848w, https://substackcdn.com/image/fetch/$s_!x08l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 1272w, https://substackcdn.com/image/fetch/$s_!x08l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71b2defe-7029-4f33-9894-3457bcb2d95c_2190x1810.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Benefits of Being on the Map</h3><h4>Actors: Exposure and Lead generation</h4><p>Organizations benefit from the map by gaining exposure to a wider, targeted audience at the exact moment when they are looking for a solution. They state clearly what their services are and where they position themselves in the market.</p><p>Posting relevant updates enables brand recall and awareness to a specific audience, clustered by search intent (technology, sector, solution). The ability to post CTA links in the posts also enables lead generation.</p><h4>Users: Fast access to service providers</h4><p>Data buyers can find who is offering the solutions they are looking for, have faster access to solutions, and understand the different offerings. The faster users extract value from data, the more they use it, which - in the end - is our goal. </p><div><hr></div><h3>Want to be featured?</h3><p>You&#8217;ve reached the right place! Simply go to <a href="https://www.databoutique.com">databoutique.com</a>, register for free, and submit your company from the <a href="https://www.databoutique.com/ecosystem">actor&#8217;s page</a> of the Ecosystem Map.</p><p>We&#8217;ll post more on this map's evolution in this newsletter's coming editions!</p><div><hr></div><h2><em>About Data Boutique</em></h2><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[How Data Boutique Works #3: Third Party Solutions]]></title><description><![CDATA[Working with data? List your solution on Data Boutique]]></description><link>https://blog.databoutique.com/p/how-data-boutique-works-3-third-party</link><guid isPermaLink="false">https://blog.databoutique.com/p/how-data-boutique-works-3-third-party</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Thu, 04 Apr 2024 04:02:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!d4Un!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>How Data Boutique Works #3: Third-Party Solutions</strong></h1><h4>Working with data? List your solution on Data Boutique</h4><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d4Un!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d4Un!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 424w, https://substackcdn.com/image/fetch/$s_!d4Un!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 848w, https://substackcdn.com/image/fetch/$s_!d4Un!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!d4Un!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d4Un!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg" width="1400" height="1400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1400,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;A platform for third-party solutions built on top of data&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="A platform for third-party solutions built on top of data" title="A platform for third-party solutions built on top of data" srcset="https://substackcdn.com/image/fetch/$s_!d4Un!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 424w, https://substackcdn.com/image/fetch/$s_!d4Un!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 848w, https://substackcdn.com/image/fetch/$s_!d4Un!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!d4Un!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64d36b2e-0912-4259-bb69-3078225b4c90_1400x1400.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A platform for third-party solutions built on top of data</figcaption></figure></div><h3><strong>An Open Platform Optimized for Web Data Utilization</strong></h3><p>Data Boutique&#8217;s mission is crystal clear: <strong>Maximize web data adoption</strong>.</p><p>To achieve that, we built a platform with the highest exposure to web data extractions. We stay <strong>agnostic</strong> about how data is used. It could be embedded in SaaS solutions, market research, or AI training.</p><p>But not all users require raw data. Often, the (in)expressed need is for more complex solutions that require enrichment, transformation, aggregation, and extraction of insights, often in a graphical form. </p><p>End-users might land on Data Boutique when attracted by the &#8220;website name&#8221; they want to monitor (IKEA, <a href="https://www.databoutique.com/buy-data-list-subset/h_m_web_scraped_data/r/recQgyXXaEmhT6kSf">H&amp;M</a>, <a href="https://www.databoutique.com/buy-data-list-subset/nike_web_scraped_data/r/recdDfxTw6LXgZ39O">Nike</a>, <a href="https://www.databoutique.com/buy-data-list-subset/gucci_web_scraped_data/r/recr1rIJ15FC7ckn4">Gucci</a>, <a href="https://www.databoutique.com/buy-data-list-subset/tesco_web_scraped_data/r/rec7ZvjQQREv5TEWA">Tesco</a>, <a href="https://www.databoutique.com/buy-data-list-subset/douglas_web_scraped_data/r/recUb4cVZVNuiErgr">Douglas</a>, Walmart etc), but the solution they need might go beyond raw data. </p><p>That is why Data Boutique hosts third-party solutions: Market intelligence, dynamic pricing, research, consultancy, AI, and commerce analytics that are ready to be used. But also custom extraction services, scraping technology as well as data handling, vertical industry expertise, and additional research. Raw data can&#8217;t talk, without the help of these expertise.</p><p>By being hosted on Data Boutique, they are shown to the user right when searching for that data.</p><div><hr></div><h3>Who Lists on Data Boutique?</h3><ul><li><p><strong>Scraping Technology Providers</strong>: Our community is primarily made of data scrapers, both freelancers and data farms, who represent the ideal audience for scraping technology providers, from anti-blocker solutions to AI-based scraping and proxy network providers.</p></li><li><p><strong>Data providers</strong>: Even if not actively selling on Data Boutique, web data providers are present on the platform to intercept non-standard, custom-made requests from data buyers; these can range from scraping websites not listed on Data Boutique to performing custom frequency extractions.</p></li><li><p><strong>SaaS and AI platforms</strong>: Market intelligence, commerce analytics, and industry-specific tracking tools have the perfect spot to showcase their platform, capturing users in their exact point of attention</p></li><li><p><strong>System integration and consulting</strong>: Companies building custom solutions and bringing their vertical industry expertise to end-users</p></li><li><p><strong>ETLs, Quality Assurance, BI, ML tools</strong>: A lot is happening in the big data scene, including technology, databases, and professions. Emerging solutions can meet new users when they are searching for data.</p></li></ul><p>   </p><h5><em>Scraping technology providers currently listed</em></h5><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0pu3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0pu3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 424w, https://substackcdn.com/image/fetch/$s_!0pu3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 848w, https://substackcdn.com/image/fetch/$s_!0pu3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 1272w, https://substackcdn.com/image/fetch/$s_!0pu3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0pu3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png" width="1211" height="384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:384,&quot;width&quot;:1211,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:120553,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0pu3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 424w, https://substackcdn.com/image/fetch/$s_!0pu3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 848w, https://substackcdn.com/image/fetch/$s_!0pu3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 1272w, https://substackcdn.com/image/fetch/$s_!0pu3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba3826f8-6a5d-4c1d-a4e4-087fc5cd8c60_1211x384.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h5><em>Scraping solution profile page of Kamaleo</em></h5><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WN2z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WN2z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 424w, https://substackcdn.com/image/fetch/$s_!WN2z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 848w, https://substackcdn.com/image/fetch/$s_!WN2z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 1272w, https://substackcdn.com/image/fetch/$s_!WN2z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WN2z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png" width="767" height="1144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1144,&quot;width&quot;:767,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184452,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WN2z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 424w, https://substackcdn.com/image/fetch/$s_!WN2z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 848w, https://substackcdn.com/image/fetch/$s_!WN2z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 1272w, https://substackcdn.com/image/fetch/$s_!WN2z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefc4f188-cac9-4d09-8d8f-3894cee8c8d4_767x1144.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h5><em>Industry expert in grocery</em></h5><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!68rb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!68rb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 424w, https://substackcdn.com/image/fetch/$s_!68rb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 848w, https://substackcdn.com/image/fetch/$s_!68rb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 1272w, https://substackcdn.com/image/fetch/$s_!68rb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!68rb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png" width="776" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:776,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:394787,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!68rb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 424w, https://substackcdn.com/image/fetch/$s_!68rb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 848w, https://substackcdn.com/image/fetch/$s_!68rb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 1272w, https://substackcdn.com/image/fetch/$s_!68rb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5f63c5b-a7ea-4333-a304-3abda951a634_776x818.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3>Why list on Data Boutique?</h3><p>Data Boutique uses a bottom-up approach to make data easily discoverable by data scientists, market analysts, and other stakeholders. This means the audience is diverse in geography, industry, and organization size.</p><p>This makes listing on Data Boutique interesting for solution providers for the following reasons:</p><h4>Extended reach</h4><p>Data Boutique&#8217;s reach is niche but capillary, extending the audience of the solution provider and raising its awareness in an otherwise very fragmented market. </p><h4>Brand Recall</h4><p>Solution providers reinforce their message with additional touch points when <em>their existing</em> clients enter Data Boutique to avoid dispersion.</p><h4>Lead Generation</h4><p>Custom Call-to-Action can convert interest to leads alongside use cases and marketing material.</p><h3>Value for Data Boutique</h3><p>Listing solutions on Data Boutique is free. The goal is dual: not to disperse users who are not looking for raw data and to incentivize them to use data-based solutions, ultimately leading to more demand for raw data.</p><div><hr></div><h3>How to List a Solution?</h3><p>If you work with data and would like to feature your solution, the first step is to create a company profile. From your profile, click the &#8220;Add company&#8221; button and follow the instructions to create a solution.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NfUd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NfUd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 424w, https://substackcdn.com/image/fetch/$s_!NfUd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 848w, https://substackcdn.com/image/fetch/$s_!NfUd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 1272w, https://substackcdn.com/image/fetch/$s_!NfUd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NfUd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png" width="456" height="220.17032967032966" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:703,&quot;width&quot;:1456,&quot;resizeWidth&quot;:456,&quot;bytes&quot;:381109,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NfUd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 424w, https://substackcdn.com/image/fetch/$s_!NfUd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 848w, https://substackcdn.com/image/fetch/$s_!NfUd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 1272w, https://substackcdn.com/image/fetch/$s_!NfUd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed8051cd-2e1e-4d1a-934a-d381254b9e6e_2273x1098.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UQxd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UQxd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 424w, https://substackcdn.com/image/fetch/$s_!UQxd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 848w, https://substackcdn.com/image/fetch/$s_!UQxd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 1272w, https://substackcdn.com/image/fetch/$s_!UQxd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UQxd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png" width="452" height="265.4258241758242" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:855,&quot;width&quot;:1456,&quot;resizeWidth&quot;:452,&quot;bytes&quot;:269295,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UQxd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 424w, https://substackcdn.com/image/fetch/$s_!UQxd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 848w, https://substackcdn.com/image/fetch/$s_!UQxd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 1272w, https://substackcdn.com/image/fetch/$s_!UQxd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43a4d061-52d8-4151-b0ad-c60e2b6795a5_2273x1334.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you want more info on how to list your solution, reach out, and let&#8217;s talk!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://calendly.com/databoutique&quot;,&quot;text&quot;:&quot;List your Solution&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://calendly.com/databoutique"><span>List your Solution</span></a></p><div><hr></div><h2><em>About Data Boutique</em></h2><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Case Study: MatchesFashion Inventory Outflows]]></title><description><![CDATA[Web data for market insights]]></description><link>https://blog.databoutique.com/p/case-study-matchesfashion-inventory</link><guid isPermaLink="false">https://blog.databoutique.com/p/case-study-matchesfashion-inventory</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Mon, 25 Mar 2024 05:02:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Y3Yk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><h1>Case Study: MatchesFashion Inventory Outflows</h1><p>In a <a href="https://blog.databoutique.com/p/web-data-in-emergency-situations">previous discussion</a>, we explored the utilization of web-based data to monitor Key Performance Indicators (KPIs) during significant market events, including the instance of the closure of MatchesFashion.com.</p><p>This detailed examination focuses on the strategic application of inventory level data, acquired through web scraping techniques, to enhance competitive and investment research within the retail sector.</p><h3><strong>Analytical Perspective: Understanding Market Dynamics</strong></h3><p>Our investigation centers on the impact of a major retailer, with annual net revenues surpassing<a href="https://ecommercedb.com/store/matchesfashion.com"> 400 million USD</a>, ceasing operations during a peak period of full-price sales. This situation often leads to market disruptions, as competitors may face challenges when inventory is released into the market at substantially reduced prices.</p><h3><strong>Research Hypothesis: Evaluating Inventory Redistribution</strong></h3><p>We aim to assess the reallocation of inventory in the wake of a retailer's closure. By comparing inventory data at a detailed SKU level from March 11th, 2024, with data from March 20th, we can observe changes in inventory levels across different brands.</p><h3><strong>Key Considerations</strong></h3><ul><li><p>The analysis period spans nine days, a concise timeframe that, despite its brevity, offers valuable insights due to the recency of the event. Analysts are encouraged to extend this research to gain a broader perspective, as data accessibility is assured.</p></li><li><p>We assume no new inventory additions during this period, considering the retailer's declared cessation of business. Therefore, observed net changes are attributed solely to inventory outflows.</p></li><li><p>It's important to note that inventory reductions are not exclusively due to direct sales. Factors include stock clearance to discount outlets, returns to brands under concession agreements, and other non-sale channels.</p></li><li><p>Inventory outflows captured in this analysis may not align with consumer purchase data from credit card transactions, especially if inventory is reclaimed by brands, sold through B2B channels, or in regions outside the credit card data provider's scope.</p></li><li><p>The analysis values inventory outflows at retail prices post-discounts. However, actual sales may incorporate additional reductions, B2B transactions may occur at lower wholesale prices, and returns to brands might not be monetarily compensated.</p></li><li><p>This study does not provide geographical specifics. The data was sourced from the UK version of the retailer's website, which caters to multiple regions, leaving the final destination of sold items indeterminate.</p></li></ul><h3>Where to find the Data</h3><p>The analysis was based entirely on <a href="https://www.databoutique.com/buy-data-list-subset/matches_fashion_web_scraped_data/r/recbesrVmk1iPu4AI">Inventory data</a> available on <a href="https://www.databoutique.com/">Data Boutique</a> and updated daily. Any user, analyst, or researcher can access and download the latest data to continue this study.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/buy-data-list-subset/matches_fashion_web_scraped_data/r/recbesrVmk1iPu4AI&quot;,&quot;text&quot;:&quot;Go to Data&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/buy-data-list-subset/matches_fashion_web_scraped_data/r/recbesrVmk1iPu4AI"><span>Go to Data</span></a></p><h3><strong>Research Findings</strong></h3><p>Our focus was on the top 50 brands, ranked by the number of items reduced over the 9-day period. The analysis presents both the percentage change in units and their corresponding value at retail prices.</p><p>A greater decline in value compared to units suggests that higher-priced items were predominantly affected.</p><p>Overall, the findings reveal a 3% reduction in units and a 5% decrease in value among the top 50 brands within the nine-day timeframe, detailed further in the accompanying chart.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y3Yk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 424w, https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 848w, https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 1272w, https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png" width="1456" height="2820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1902015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 424w, https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 848w, https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 1272w, https://substackcdn.com/image/fetch/$s_!Y3Yk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c62f0d7-b870-4c92-a0fc-9ed8c4de498e_3493x6765.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This case study exemplifies the potent application of web-scraped data for in-depth market analysis, offering valuable insights for retailers, brands, market analysts, and professionals in the luxury, fashion, and web-scraping domains.</p><div><hr></div><h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[How Data Boutique Works #2: Volume Discounts]]></title><description><![CDATA[The volume-related pricing system explained]]></description><link>https://blog.databoutique.com/p/how-data-boutique-works-2-volume</link><guid isPermaLink="false">https://blog.databoutique.com/p/how-data-boutique-works-2-volume</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Mon, 18 Mar 2024 05:01:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!jbVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>How Data Boutique Works #2: Volume Discounts</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jbVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jbVN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jbVN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jbVN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jbVN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jbVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg" width="960" height="720" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:720,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:131265,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jbVN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 424w, https://substackcdn.com/image/fetch/$s_!jbVN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 848w, https://substackcdn.com/image/fetch/$s_!jbVN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!jbVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50540aa1-6d67-41da-84b7-df467362cadf_960x720.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Container-ship loading at Hamburg Port</figcaption></figure></div><h2>Large volumes of data</h2><p>Offering discounts for large-size orders is an ordinary component of commerce. Sellers give up a portion of the unit margins in exchange for a bigger unit count.</p><p>This is also true for web-scraped data. Since Data Boutique's mission is to make data trades smoother, an embedded volume-related discount system is in place.</p><h2>Understanding buying needs</h2><p>There are multiple reasons why data buyers want different volumes of data, depending on their data usage, development stage, and horizon. The same buyer can behave differently over time or in different projects.</p><h3>Retail vs. Professional data buying</h3><p><strong>Retail</strong> data buying refers to frugal, one-off data purchases. The buyer picks one dataset from the shelf and buys it, maybe for a research project or to try to better study its content. Retail buyers typically buy low volumes of data and are difficult to forecast as they are not planned in advance. They don&#8217;t represent a profit for data sellers individually but do when aggregated, even if keeping a high-risk profile.</p><p><strong>Professional</strong> data buying refers to systematically using datasets within company processes, with recurring data updates and a known set of websites to monitor. It is a planned, regular, repeated data order, often very large, consisting of hundreds or thousands of datasets. It represents a highly stable revenue stream for data sellers, but the scale of their operations requires keeping costs as low as possible and substantial volume discounting. </p><p><strong>Professionals start as Retail</strong>: The journey to large, wholesale purchases starts with one buy. It&#8217;s faster, cleaner, and fairer than the <em>free trials</em> in the rest of the data industry. Free trials are good for proprietary, unique datasets but not so much for web-scraped content.</p><h3>Build new vs. Consolidate existing</h3><p>When purchasing data, the more certain we are of the scope, the more discounts we can leverage. We have less leverage for discounts in a newly designed BI system where the scope is not consolidated (we might want to test which websites are more interesting to monitor). But once this scope is certain, we can increase the discount by committing to longer contracts.</p><p>This dynamic approach increases flexibility: Data buyers can try new datasets to see which has the most value and then later consolidate the scope to reduce costs.</p><p>Example:</p><blockquote><p><em>If I want to build a discount alert system, but I&#8217;m not yet sure of which websites are more interesting to watch, or in how many countries, or if doing it weekly or daily, I could do short-term tests on each website, before confirming I want them regularly.</em> </p></blockquote><div><hr></div><h2>The volume-related pricing system variables</h2><p>At Data Boutique, we apply the following elements of volume discounting, and they all cumulate to the final paid price: </p><h3>Order size</h3><p>Order size discount is proportional to the pure economic size (EUR value) of the order: Generically speaking, the bigger an order is (more websites, countries, higher frequency), the more discounted it is.  </p><h3>Frequency</h3><p>The higher the refresh frequency, the higher the discount. Discounting increases progressively if the refresh order is only once, once every month, every week, twice a week, or daily. </p><h3>Commonality</h3><p>Buying different extractions from the same website is discounted more than buying extractions from different websites. This is related to the difficulty of web scraping. Example: Ikea Italy is similar to Ikea France, so I get a discount when buying Italy AND France, rather than buying them separately.</p><h3>Duration</h3><p>The longer the time horizon, the higher the discount. Data Boutique currently supports a 12-month minimum duration or a 36-month minimum duration for buyers with a long perspective, which gives a substantial opportunity to reduce data costs.</p><h3>Loyalty</h3><p>The more active a buyer is, measured in Recency, Monetary, and Frequency terms (RMF), the more they add value to the network; part of this value is given back to the buyer through discounts. The more active a buyer is, the more advantages there are.</p><div><hr></div><h2>Transparency is speed</h2><p>Buying and selling are tough elements of human interactions. Our job is to remove all obstacles that prevent this exchange from happening smoothly.</p><p>The faster the trade happens, the less money both parties waste.</p><p>Thanks for reading so far. Subscribe to this newsletter or join <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a> to speak with us.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[How Data Boutique Works #1: Data Auctions]]></title><description><![CDATA[The data auction system explained]]></description><link>https://blog.databoutique.com/p/how-data-boutique-works-1-data-auctions</link><guid isPermaLink="false">https://blog.databoutique.com/p/how-data-boutique-works-1-data-auctions</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Mon, 11 Mar 2024 05:58:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1ieb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is the data marketplace for web scraping. We make buying and selling data faster and safer for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>How Data Boutique Works #1: Data Auctions</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1ieb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1ieb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1ieb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1ieb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1ieb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1ieb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:456349,&quot;alt&quot;:&quot;Tsukiji Fish Market auction, Tokyo&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tsukiji Fish Market auction, Tokyo" title="Tsukiji Fish Market auction, Tokyo" srcset="https://substackcdn.com/image/fetch/$s_!1ieb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 424w, https://substackcdn.com/image/fetch/$s_!1ieb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 848w, https://substackcdn.com/image/fetch/$s_!1ieb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!1ieb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10776718-8724-44fc-8ba3-9f3e7233ee5e_3000x2000.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tsukiji Fish Market auction, Tokyo</figcaption></figure></div><h2>Data Auctions</h2><p>Data Boutique is a web-scraped data marketplace where prices are set with an auction system between buyers and sellers. </p><p>Prices are public and transparent to everyone.</p><h3>What is negotiated: The price for a full website scan</h3><p>Sellers and buyers negotiate the <em>price of a full website scan of a single date</em>. </p><p>Example:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hTeG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hTeG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 424w, https://substackcdn.com/image/fetch/$s_!hTeG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 848w, https://substackcdn.com/image/fetch/$s_!hTeG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 1272w, https://substackcdn.com/image/fetch/$s_!hTeG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hTeG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png" width="1456" height="379" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:379,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:128178,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hTeG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 424w, https://substackcdn.com/image/fetch/$s_!hTeG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 848w, https://substackcdn.com/image/fetch/$s_!hTeG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 1272w, https://substackcdn.com/image/fetch/$s_!hTeG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1052b6ba-9500-47a1-9881-3f5bc598c63b_2179x567.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><em>In this example, the Seller asks 6.00 EUR for a full <a href="https://www.databoutique.com/buy-data-list-subset/mr_porter_web_scraped_data/r/recElhzTKV1bLKLto">Mr. Porter's scan</a>. </em></p><p><em>If I wanted to purchase this dataset once, I&#8217;d pay 6.00 EUR. If I wanted to have the same dataset every week for a year, the cost would be 24 EUR in a month (4 weeks) or 312 EUR in a year (52 weeks).</em></p></blockquote><p>When buyers and sellers negotiate, they negotiate on the 6 EUR price.</p><p>Each website is negotiated and priced independently: The <a href="https://www.databoutique.com/buy-data-list-subset/norauto_web_scraped_data/r/recXB6Oi9ARXLoBVx">Norauto</a> dataset is priced differently than the <a href="https://www.databoutique.com/buy-data-list-subset/mr_porter_web_scraped_data/r/recElhzTKV1bLKLto">Mr Porter</a> dataset.</p><div><hr></div><h3>Buyers: the maximum price willing to pay (bid)</h3><p>Buyers can express the highest price they are willing to pay (bid). This will be visible anonymously to all sellers.</p><p>Bids can be placed on existing websites (those already trading on Data Boutique) and websites not yet listed (where a seller is not offering it for a price yet).</p><blockquote><p><em>Example: A dataset is has an asked price of 50 EUR, and a buyer bids for 10 EUR. All sellers get notified of the bid. If someone finds it profitable, they may accept, or continue negotiating.</em></p></blockquote><p>If a bid (buyer&#8217;s price) is too low, it might not attract sellers, as they will not profit from it. </p><p>Factor <strong>buyers</strong> need to consider before making a bid:</p><ul><li><p><strong>Urgency</strong>: The more urgency, the higher the price, as you need an attractive offer. </p></li><li><p><strong>Commitment</strong>: In the eyes of a seller, buying data only once is less interesting than buying data every week or every day for a year. This information is part of your bid, so one-off purchases might need a higher bid price than continuous (daily, weekly) ones to be considered interesting.</p></li><li><p><strong>Availability</strong>: If a website is not yet on Data Boutique, the bid would be the sole revenue for the seller. Being the first to ask a website might require a more generous bid to be attractive.</p></li></ul><div><hr></div><h3>Sellers: The minimum price willing to take (ask)</h3><p>When a dataset is published, the price shown is the asked price.</p><p>The asked price is public and applied to all buyers. </p><p>When an ask is too high, few (or no) buyers will purchase it, as they will find more convenient ways to collect it, or their case study will not be profitable anymore.</p><p>Factors sellers need to consider before setting the asked price</p><ul><li><p><strong>Existing vs. new extractions</strong>: You are a seller. If you are already scraping a website, the additional costs to list on Data Boutique are minimal, and you can afford to ask for a lower price. Building a new extraction from scratch will cost you more. List first the ones you do, you have more margins.</p></li><li><p><strong>First is better than lowest</strong>: It is less efficient to engage in price downward spirals with other sellers than it is to be the only one serving that website. If you can choose, go for the one with no other sellers.</p></li><li><p><strong>Buyer alternatives</strong>: This is the real price cap. Buyers always have the alternatives of commissioning web scraping to someone else or web scraping internally, so the price must be considered competitive against that. If the price asked is too high, some buyers might even decide to desist and not pursue the dataset anymore.</p></li></ul><div><hr></div><h3>Pre-negotiation</h3><p>Buyers often are not ready to buy. They need a price quote before they can make a decision.</p><p>Sellers, on their side, don&#8217;t want to pay the extraction costs if they are unsure they will make the sale. </p><p>This is where pre-negotiations step in: They state <em>intentions</em> to buy and sell under specific conditions <em>before</em> money is spent or data is extracted.</p><blockquote><p><em>Example: A buyer might be interested in a website, but has no idea about the cost. They bid for 10 EUR (with no additional information, buyers will bid low). A seller replies the minimum ask will be 100 EUR (they have no additional reassurance of future purchases so they ask high). The buyer may then consider raising the bid to 30 EUR, including a commitment to weekly data refreshes. This will be more attractive for sellers.</em> </p></blockquote><p>Additional options like <strong><a href="https://www.databoutique.com/faq-article/what%2520is%2520automatic%2520purchase%2520%2528ap%2529%2520on%2520data%2520boutique%253f/r/recouTgEE9vP5cL6I">automatic purchase (AP)</a></strong> - the automatic execution of the order when a seller matches the bid - or <strong><a href="https://www.databoutique.com/faq-article/what%2520is%2520the%2520delivery%2520assurance%2520deposit%2520%2528dad%2529%2520on%2520data%2520boutique%253f/r/recWLGjJWkPlNixLr">Delivery Assurance Deposit (DAD)</a></strong>, a small confirmation fee deposited to the seller upon delivery, deducted from the final price if the buyer confirms the purchase, are also methods of lowering the risks from buyers and sellers involved in the trade.</p><div><hr></div><h2>Subscribe to get updates</h2><p>Thanks for reading so far. If you like collecting, using, analyzing, squeezing, or torturing data, visit our website and subscribe to this newsletter.</p><p>Want to speak to us? Join <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[Web Data in Emergency Situations: The MatchesFashion case]]></title><description><![CDATA[More case studies for web data]]></description><link>https://blog.databoutique.com/p/web-data-in-emergency-situations</link><guid isPermaLink="false">https://blog.databoutique.com/p/web-data-in-emergency-situations</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Fri, 08 Mar 2024 17:47:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qUwI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a data marketplace focused on web scraping. We make it simpler to match those who collect data with those who know how to use it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Web data in emergency situations: The MatchesFashion case</h1><h3>The Emergency:</h3><p>On March 7th, <a href="https://www.businessoffashion.com/articles/retail/matchesfashion-is-shutting-down/">Business of Fashion</a> reported that <a href="https://www.matchesfashion.com/">Matches</a> (Formerly MatchesFashion) will be shut down by its new owner, Frasers Group. <a href="https://hypebeast.com/2023/12/matchesfashion-matches-sale-frasers-group-apax-partners-luxury">Frasers bought the retailer for 52M GBP from Apax partners</a> just two months earlier. In 2017, Apax Partners purchased MatchesFashion at a reported $1 billion USD valuation. </p><blockquote><p>A relevant online retailer shuts down, and this means both trouble and opportunity for many businesses.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qUwI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qUwI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 424w, https://substackcdn.com/image/fetch/$s_!qUwI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 848w, https://substackcdn.com/image/fetch/$s_!qUwI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 1272w, https://substackcdn.com/image/fetch/$s_!qUwI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qUwI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png" width="931" height="628" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:628,&quot;width&quot;:931,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:877120,&quot;alt&quot;:&quot;MatchesFashion homepage&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MatchesFashion homepage" title="MatchesFashion homepage" srcset="https://substackcdn.com/image/fetch/$s_!qUwI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 424w, https://substackcdn.com/image/fetch/$s_!qUwI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 848w, https://substackcdn.com/image/fetch/$s_!qUwI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 1272w, https://substackcdn.com/image/fetch/$s_!qUwI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f1ae327-ad0f-4e59-870c-85f1752dc9a8_931x628.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">MatchesFashion homepage</figcaption></figure></div><div><hr></div><h3>What may happen now, and why it&#8217;s key to use data to monitor the situation</h3><p>The shutdown of MatchesFashion may have several key consequences for other online luxury retailers in the short and mid-term:</p><ol><li><p><strong>Market Redistribution</strong>: Competitors may experience an influx of customers seeking alternatives. This could boost sales and market share for those able to capture the displaced customer base, shifting competitive dynamics.</p></li><li><p><strong>Supply Chain Impact</strong>: Current Matches inventory needs to find a way in the off-price market; there might be bargains to be made. Brands distributed by Matches might seek new partnerships and renegotiate terms.</p></li><li><p><strong>Price and Discounting Strategies</strong>: Initially, there might be an increase in promotional activities as retailers attempt to attract MatchesFashion's former customers. However, in the mid-term, there could be a reassessment of discounting strategies as the market stabilizes and retailers focus on brand value and customer loyalty.</p></li><li><p><strong>Investor Sentiment</strong>: The closure of MatchesFashion, coupled with Farfetch's delisting, could dampen investor enthusiasm for the online luxury retail space, especially for companies reliant on a similar multibrand strategy. This might lead to more cautious investment, both from public market investors and private equity firms, impacting the availability of capital for growth and expansion.</p></li><li><p><strong>Market Perception</strong>: The broader perception of the online luxury retail market may shift, prompting companies to more clearly articulate their value propositions and differentiation strategies. There could be an increased focus on sustainability, exclusivity, and personalized customer experiences as key drivers of value in the eyes of both consumers and investors.</p></li></ol><div><hr></div><h3>What data to use if you are a retailer, a brand, a supply chain operator, or an investor</h3><p>In the wake of MatchesFashion's closure, three critical data points demand heightened attention from <strong>competitors</strong>, <strong>brands</strong>, <strong>supply chain operators</strong>, and <strong>investors</strong>: </p><ul><li><p><strong>Prices:</strong> Pricing (in)discipline is the first factor that pulls consumer attention, it will be a key element for all players to monitor. </p></li><li><p><strong>Discounts</strong> play an even more crucial role in driving consumer traffic and sales volumes, especially in a shifting landscape where capturing MatchesFashion's former customer base becomes a strategic priority. </p></li><li><p><strong>Inventory levels</strong> offer insights into MatchesFashion (an their competitors) stock.</p></li></ul><p>Each data point provides strategic insights that can help stakeholders adapt to market changes, identify opportunities, and make informed decisions amidst the evolving competitive dynamics following MatchesFashion's exit from the market.</p><p><strong>Geographical coverage</strong>: fashion luxury has global coverage, and prices and discounts change significantly by region, so we suggest a global coverage (USA, Canada, UK, France, Germany, Turkey, Middle East, Singapore, China, Japan, and Australia) at least at a weekly rate.</p><p>Websites: we recommend monitoring at least the top 10 MatchesFashion competitors listed here:</p><ol><li><p><strong><a href="http://farfetch.com/">Farfetch</a></strong>: A leading global platform for luxury fashion, also in troubled waters.</p></li><li><p><strong><a href="https://net-a-porter.com/">Net-a-Porter</a></strong>: Part of the YOOX NET-A-PORTER GROUP, owned by Richemond.</p></li><li><p><strong><a href="https://www.mytheresa.com/">Mytheresa</a></strong>: A Munich-based luxury e-commerce platform, <a href="https://finance.yahoo.com/quote/MYTE/key-statistics">publicly listed</a>.</p></li><li><p><strong><a href="https://www.ssense.com/">SSENSE</a></strong>: A Canadian-based online retailer.</p></li><li><p><strong><a href="https://www.modaoperandi.com/">Moda Operandi</a></strong>: Runway pre-order runway collections, as well as in-season items.</p></li><li><p><strong><a href="https://www.luisaviaroma.com/">Luisa Via Roma</a></strong>: An Italian retailer.</p></li><li><p><strong><a href="https://yoox.com/">Yoox</a></strong>: Part of the YOOX NET-A-PORTER GROUP, Yoox is known for its extensive range of end-of-season stock.</p></li><li><p><strong><a href="https://www.giglio.com/">Giglio.com</a></strong>: An Italian online fashion retailer.</p></li><li><p><strong><a href="https://www.cettire.com/">Cettire</a></strong>: A global online marketplace, <a href="https://finance.yahoo.com/quote/CTT.AX/key-statistics">listed in Australia</a>.</p></li><li><p><strong><a href="https://www.italist.com/">Italist</a></strong>: A global online marketplace. Together with Cettire, they have been capturing part of the Farfetch outflows of boutiques.</p></li></ol><div><hr></div><h2>Where to get this data ASAP</h2><p><a href="https://www.databoutique.com/">Databoutique.com</a>: It is the fastest way to access web-scraped data. It&#8217;s already there, clearly priced and used by hundreds of companies.</p><p>You could also do the collection yourself, but by the time you&#8217;re ready, the crisis will be over.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Go to Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Go to Data Boutique</span></a></p><div><hr></div><h2>Subscribe and spread the word</h2><p>If you like to collect, use, analyze, squeeze, or torture data, join this newsletter and spread the word. Want to speak to us? Join <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[How to Estimate Sales Using Inventory Web-Scraped Data ]]></title><description><![CDATA[A powerful use case for web data]]></description><link>https://blog.databoutique.com/p/how-to-estimate-sales-using-inventory</link><guid isPermaLink="false">https://blog.databoutique.com/p/how-to-estimate-sales-using-inventory</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Mon, 04 Mar 2024 05:01:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a data marketplace focused on web scraping. We make it simpler to match those who collect data with those who know how to use it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>How to Estimate Sales Using Inventory Web-Scraped Data</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eo1d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eo1d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!eo1d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!eo1d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!eo1d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eo1d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:464360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eo1d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!eo1d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!eo1d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!eo1d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9903c798-dba6-45f2-8686-1d41b836bd9b_1024x1024.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What is web-scraped inventory data</h2><p>Some e-commerce websites disclose stock levels of their available products. </p><p>Some do it to offer better customer service, like IKEA, or to communicate scarcity and give a sense of urgency, like sneaker-heads Mekka StockX. Other websites sell one-item-only products, like second-hand clothing website Vestiaire Collective or Tesla&#8217;s internal used vehicle store. Others, even if not displaying the information graphically, have it in plain HTML code:</p><blockquote><p><em>Inventory data can be accessed with web-scraping techniques and be a <strong>powerful source</strong> for <strong>competitive and investment</strong> intelligence</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_C2P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_C2P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 424w, https://substackcdn.com/image/fetch/$s_!_C2P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 848w, https://substackcdn.com/image/fetch/$s_!_C2P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 1272w, https://substackcdn.com/image/fetch/$s_!_C2P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_C2P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png" width="1456" height="795" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:795,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2380593,&quot;alt&quot;:&quot;IKEA.com inventory availability by store&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="IKEA.com inventory availability by store" title="IKEA.com inventory availability by store" srcset="https://substackcdn.com/image/fetch/$s_!_C2P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 424w, https://substackcdn.com/image/fetch/$s_!_C2P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 848w, https://substackcdn.com/image/fetch/$s_!_C2P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 1272w, https://substackcdn.com/image/fetch/$s_!_C2P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80b87e7a-19cf-4683-9d69-02d432ed2632_3793x2070.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">IKEA.com inventory availability by store</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZCj4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZCj4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 424w, https://substackcdn.com/image/fetch/$s_!ZCj4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 848w, https://substackcdn.com/image/fetch/$s_!ZCj4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 1272w, https://substackcdn.com/image/fetch/$s_!ZCj4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZCj4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png" width="1456" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1483742,&quot;alt&quot;:&quot;StockX.com quantity indication&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="StockX.com quantity indication" title="StockX.com quantity indication" srcset="https://substackcdn.com/image/fetch/$s_!ZCj4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 424w, https://substackcdn.com/image/fetch/$s_!ZCj4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 848w, https://substackcdn.com/image/fetch/$s_!ZCj4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 1272w, https://substackcdn.com/image/fetch/$s_!ZCj4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f9c9f4d-ca54-41d2-ab3f-13de970faa4b_3803x1866.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">StockX.com quantity indication</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rSKr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rSKr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 424w, https://substackcdn.com/image/fetch/$s_!rSKr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 848w, https://substackcdn.com/image/fetch/$s_!rSKr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 1272w, https://substackcdn.com/image/fetch/$s_!rSKr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rSKr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png" width="1456" height="712" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:712,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2051084,&quot;alt&quot;:&quot;Tesla used vehicles car inventory&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tesla used vehicles car inventory" title="Tesla used vehicles car inventory" srcset="https://substackcdn.com/image/fetch/$s_!rSKr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 424w, https://substackcdn.com/image/fetch/$s_!rSKr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 848w, https://substackcdn.com/image/fetch/$s_!rSKr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 1272w, https://substackcdn.com/image/fetch/$s_!rSKr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06708896-4ad6-4828-ba12-6b4e2ae84c12_4414x2158.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Tesla used vehicles car inventory</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SbLb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SbLb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 424w, https://substackcdn.com/image/fetch/$s_!SbLb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 848w, https://substackcdn.com/image/fetch/$s_!SbLb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 1272w, https://substackcdn.com/image/fetch/$s_!SbLb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SbLb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png" width="642" height="234.57692307692307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:532,&quot;width&quot;:1456,&quot;resizeWidth&quot;:642,&quot;bytes&quot;:88504,&quot;alt&quot;:&quot;Website with stock info in the source code&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Website with stock info in the source code" title="Website with stock info in the source code" srcset="https://substackcdn.com/image/fetch/$s_!SbLb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 424w, https://substackcdn.com/image/fetch/$s_!SbLb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 848w, https://substackcdn.com/image/fetch/$s_!SbLb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 1272w, https://substackcdn.com/image/fetch/$s_!SbLb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b351c93-8b0f-4eeb-a608-91148097e92b_1601x585.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">A website with stock info in the source code</figcaption></figure></div><div><hr></div><h2>How to estimate sales from inventory data</h2><p>Inventory is point-in-time information on the quantity of an item. Monitoring how this quantity changes gives insights into how many items have been sold. </p><p>IKEA example: the quantity of the <a href="https://www.ikea.com/it/it/p/blanda-blank-ciotola-inox-20057255/">metal bowl BLANDA BLANK</a> (see picture) in a store of Milan is 2.439 on the 26th of February 2024.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-dfZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-dfZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 424w, https://substackcdn.com/image/fetch/$s_!-dfZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 848w, https://substackcdn.com/image/fetch/$s_!-dfZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 1272w, https://substackcdn.com/image/fetch/$s_!-dfZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-dfZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png" width="578" height="279.4725274725275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:704,&quot;width&quot;:1456,&quot;resizeWidth&quot;:578,&quot;bytes&quot;:233689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-dfZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 424w, https://substackcdn.com/image/fetch/$s_!-dfZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 848w, https://substackcdn.com/image/fetch/$s_!-dfZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 1272w, https://substackcdn.com/image/fetch/$s_!-dfZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1850c4f1-58f2-4f00-8673-254e39bc339f_1747x845.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The next day, this value is 2.418, which is 21 units lower</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d9Fx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d9Fx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 424w, https://substackcdn.com/image/fetch/$s_!d9Fx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 848w, https://substackcdn.com/image/fetch/$s_!d9Fx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 1272w, https://substackcdn.com/image/fetch/$s_!d9Fx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d9Fx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png" width="590" height="329.03846153846155" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:590,&quot;bytes&quot;:195584,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d9Fx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 424w, https://substackcdn.com/image/fetch/$s_!d9Fx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 848w, https://substackcdn.com/image/fetch/$s_!d9Fx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 1272w, https://substackcdn.com/image/fetch/$s_!d9Fx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed3a7c42-1048-4eff-8e75-ab5b03b448be_1517x846.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> </p><p>We can assume that in the time between our observations, in the Milan store of IKEA, 21 metal bowls BLANDA BLANK were sold at 6.95 EUR each, for a gross value of 146 EUR.</p><p>Suppose we repeat this for every item in the store, for every IKEA store worldwide, for every day of the year: <em><strong>We have a real-time hyper-detailed revenue estimate for IKEA</strong></em>.</p><h3>Assumptions</h3><p>As inventory changes for many reasons, not only sales, some assumptions must be made to interpret data correctly. The key points we need to consider are:</p><ol><li><p><strong>Frequency</strong>: The more frequent observations, the better. Daily (or intraday) capture is desirable, but that can drive up costs. I worked on a project on a specific luxury website and found that weekly snapshots were 98% as reliable as 20-minute snapshots (at a 99.7% lower cost!). Industry-specific considerations apply: Fast-moving consumer goods are different than luxury bags.</p></li><li><p><strong>Number of stock points</strong>: We need to understand (or at least estimate) how many stock points (warehouses) serve a specific website. If a website is served by a central warehouse for all of Europe, we need only one extraction (i.e., France) that represents all others. But if, like the IKEA case, we have one stock point for every store.</p></li><li><p><strong>Type of stock points</strong>: Some websites disclose online and offline warehouses, so we know we have visibility for online and offline sales; sometimes, instead, we have only visibility on the online warehouses and zero visibility of what happens offline. This is a material difference when we want to understand what kind of revenue we are trying to estimate.</p></li><li><p><strong>Restocking policies</strong>: Having a good awareness of the restocking policies of the industry is key for data interpretation. The more we know, the more our estimate will be correct. Are goods restocked once or twice every season (like in luxury goods) or at intraday frequency? This has a direct impact on our choice of frequency and data interpretation.</p></li><li><p><strong>Inventory transfers</strong>: This is similar to restocking, except the goods move between one warehouse and another. We might notice -1.000 units less in IKEA store A and two days later +1.000 units more of the same object in IKEA store B.</p></li><li><p><strong>&#8220;Dead stock&#8221; management (unsold inventory):</strong> What happens when a product reaches the end of its shelf life? Monitoring this is key for some important ESG-related metrics, like unsold inventory.</p></li><li><p><strong>Canceled/Returned orders inventory management: What happens when an order is canceled?</strong> How does it appear on inventory? We found this behavior can be a source of a lot of noise in the data if interpreted incorrectly. </p></li><li><p><strong>Noise:</strong> Data is only as good as the system that generates it. Delays, errors, lags, misalignments, coding errors, manual data entry errors. Be prepared to have a model that keeps that into account.</p></li></ol><div><hr></div><h2>Differences with other data sources</h2><p>Before building a dataset from web scraping, it is worth understanding how this compares to alternative datasets in the market. </p><p>One popular dataset among professional investors is <a href="https://datarade.ai/data-categories/credit-card-transaction-data">credit (and debit) card transaction data</a>. This data contains financial transactions of specific user groups.</p><p>While credit card data is more reliable in terms of absolute numbers, has a consistent history, and will be more stable over time, web-scraped data can offer deeper insights, and given the cost difference of collecting the latter, in our previous experiences, it is a valid integration, if not an alternative to the former.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rZmg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rZmg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 424w, https://substackcdn.com/image/fetch/$s_!rZmg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 848w, https://substackcdn.com/image/fetch/$s_!rZmg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 1272w, https://substackcdn.com/image/fetch/$s_!rZmg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rZmg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png" width="1456" height="1747" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1747,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:652835,&quot;alt&quot;:&quot;credit card data vs web scraped inventory-based transaction data&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="credit card data vs web scraped inventory-based transaction data" title="credit card data vs web scraped inventory-based transaction data" srcset="https://substackcdn.com/image/fetch/$s_!rZmg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 424w, https://substackcdn.com/image/fetch/$s_!rZmg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 848w, https://substackcdn.com/image/fetch/$s_!rZmg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 1272w, https://substackcdn.com/image/fetch/$s_!rZmg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6cead4a-c959-4eaa-a363-a1c781afb127_3780x4536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">credit card data vs web-scraped inventory-based transaction data</figcaption></figure></div><h3>Data Quality</h3><ul><li><p><strong>Scope</strong>: Credit card data works the same on every website, regardless of whether they disclose inventory. Web-scraped data requires the website to disclose it.</p></li><li><p><strong>Reliability</strong>: Credit card data are real transactions, while inventory-based calculations are estimates and heavily linked to the assumptions made.</p></li><li><p><strong>Completeness</strong>: Credit card data are usually limited by card or bank&#8217;s brand, often related to the demographics of the user base. On the other hand, web-scraped data can show all transactions but is limited to the available warehouses.</p></li><li><p><strong>Cross-retailer consistency</strong>: Credit card data have the same feed regardless of the website. Web scraping depends on the website and what kind of information is disclosed.</p></li><li><p><strong>Time consistency</strong>: Credit card data is a consistent feed (doesn&#8217;t change structure) over time. On the contrary, web scraping is exposed to website changes and anti-block technology. If a day&#8217;s collection is missing, it is lost. </p></li><li><p><strong>Historical data</strong>: Credit card data go back in time for years. Web scraping is only as old as the day you start doing it. This is a differentiation element for those who start collecting it.</p></li></ul><h3>Use Cases</h3><ul><li><p><strong>Transactions by demographics</strong>: Credit card data can be enriched with (anonymized) consumer data, giving insights into demographics. Web scraping has no visibility on the demand side and only makes assumptions based on the change in supply.</p></li><li><p><strong>Transactions by geography</strong>: Credit card data can be enriched with location data (for physical transactions) or user (anonymized) data for online transactions. Web scraping can go only as far as location information of the underlying stock point.</p></li><li><p><strong>Transactions by product</strong>: Credit card data has almost no data on the products within an order; they see the transaction as a whole. Web scraping on the contrary has visibility on products, category, composition, description, price and discounts.</p></li><li><p><strong>Average Order Value (AOV)</strong>: Credit card data show the order as a whole, giving insights on average order value (AOV), whereas web-scraped cannot see if a product was purchased alone or in a group with others.</p></li><li><p><strong>Margins and discounts</strong>: Credit card data only shows the price paid. Webs scraped data have visibility on the discount offered and can be very precious for estimates or models on margins.</p></li><li><p><strong>Inventory Turnover Ration (ITR)</strong>: Web scraped data gives the metrics for all inventory KPIs, including ITR and sales velocity.</p></li><li><p><strong>Working Capital Invested in Inventory: </strong>Once embedded in the model, the cost (or value) for items stored in a warehouse, we can have a real-time estimate for working capital invested in it.</p></li><li><p><strong>Unsold inventory (ESG related)</strong>: Unsold inventory is a huge topic in many industries, with big implications in ESG. Web scraped data can measure this, whereas credit card data lacks insight.</p></li><li><p><strong>Stock Breaks</strong>: When an item is Out of Stock, it ceases making sales. Web scraped data can identify this.</p></li></ul><h3>ROI - Return on Investment of Data</h3><ul><li><p><strong>Time-to-Market (TTM)</strong>: The time it takes from when you get the data until you can start actually using it is in favor of credit card data because it is ready to use. Web scraped data has an initial time investment required to build history.</p></li><li><p><strong>Cost of data</strong>: The big downside of credit card data is its cost. Web scraping, especially when accessed via marketplaces like Data Boutique, has data costs of orders of magnitude lower.</p></li><li><p><strong>Differentiation</strong>: The big point. Credit card data are extremely popular and spread in the investment community, offering little differentiation. Web scraping, given the investment in time it takes to build history and the fact that every website is a separate investment, is still underused.</p></li></ul><h2>Conclusions</h2><p>The reason for this deep-dive into inventory data is that we are releasing the <a href="https://www.databoutique.com/buy-schema-datasets?recordId=recFl4cf3VzsSSOwE">Ecommerce Inventory Schema</a> on Data Boutique, where this kind of data can be accessed faster and easier than web scraping first-hand. </p><p>Accessing web-scraped data on inventory can be a differentiating asset for financial analysts, market researchers, and consumer goods competitive intel.</p><p>We hope this piece helps contextualize the power and opportunity of web-scraped inventory data for a future set of applications that use it.</p><p>If you&#8217;re interested in collecting inventory data (and potentially selling it on Data Boutique), we recommend these readings:</p><ul><li><p><a href="https://substack.thewebscraping.club/p/scraping-inventory-level">The Web Scraping Club THE LAB #27</a></p></li><li><p><a href="https://substack.thewebscraping.club/p/the-lab-28-deep-dive-on-inventory">The Web Scraping Club THE LAB #28</a></p></li></ul><div><hr></div><h2>About the Project</h2><p>Data Boutique aims to increase web data adoption by creating a win-win environment for data sellers and buyers. If you operate in data, join our community. </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Visit Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Visit Data Boutique</span></a></p><div><hr></div><p>Thanks for reading and helping our community grow.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[Tracking Farfetch [Part 2]: Kering's Exit]]></title><description><![CDATA[A case study for web data]]></description><link>https://blog.databoutique.com/p/tracking-farfetch-part-2-kerings</link><guid isPermaLink="false">https://blog.databoutique.com/p/tracking-farfetch-part-2-kerings</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Wed, 28 Feb 2024 07:45:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZM9t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a data marketplace focused on web scraping. We make it simpler to match those who collect data with those who know how to use it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Tracking Farfetch [Part 2]: Kering's Exit</h1><p>Last week, we wrote about how to use data to monitor Farfetch. This is a follow-up post to see data in action and look closely at the rumors of <a href="https://wwd.com/business-news/retail/kering-pulls-brands-off-farfetch-wake-sale-coupang-1236171102/">Kering&#8217;s brands exiting Farfetch</a> (new rumors on <a href="https://www.milanofinanza.it/fashion/farfetch-coupang-chiama-rothschild-per-la-cessione-di-ngg-202402261917418630">Farfetch offloading NGG brands</a> also just came out). </p><p>Last week&#8217;s post:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;17d1e97a-43eb-43ed-904d-674fec8fd872&quot;,&quot;caption&quot;:&quot;About Data Boutique Data Boutique is a data marketplace focused on web scraping. We make it simpler to match those who collect data with those who know how to use it.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Tracking How Bad The Situation At Farfetch Really Is&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:95986306,&quot;name&quot;:&quot;Andrea Squatrito&quot;,&quot;bio&quot;:&quot;Data Boutique co-founder and CEO&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/2e3e6ad0-3724-43d5-9507-fdc07f746895_1080x1080.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-02-23T04:59:19.211Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.databoutique.com/p/tracking-how-bad-the-situation-at&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:141610150,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:2,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Data Boutique&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F01280021-8d9b-4d20-8d73-cf0d6d150ed2_1064x1064.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>This is what the situation looked like on the 26th of February, compared to 100 days before:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZM9t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZM9t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 424w, https://substackcdn.com/image/fetch/$s_!ZM9t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 848w, https://substackcdn.com/image/fetch/$s_!ZM9t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 1272w, https://substackcdn.com/image/fetch/$s_!ZM9t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZM9t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png" width="786" height="1539" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1539,&quot;width&quot;:786,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104772,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZM9t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 424w, https://substackcdn.com/image/fetch/$s_!ZM9t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 848w, https://substackcdn.com/image/fetch/$s_!ZM9t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 1272w, https://substackcdn.com/image/fetch/$s_!ZM9t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fded32bf7-a8b1-4673-a779-d5951d0d9888_786x1539.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How to read data:</p><ul><li><p><strong>SKU count:</strong> The number of shoppable items on Farfetch aggregated by brand. A negative change means there are fewer products today of that brand than there were 100 days ago</p></li><li><p><strong>Inventory:</strong> Farfetch does not hold inventory, but it displays the inventory that the merchants (boutiques, brands, and retailers) decide to share on Farfetch. When <strong>inventory goes down</strong>, merchants share less on Farfetch today than 100 days ago. When <strong>inventory increases</strong>, brands (or merchants) fill their warehouses and make deeper inventory available on the channel.</p></li><li><p><strong>Countries</strong>: Farfetch allows consumers to &#8220;fetch&#8221; items from boutiques far away (hence &#8220;far-fetch&#8221;), so countries do not mean that the inventory is in that country. It means the merchant makes the inventory shoppable in that country. A brand  might want to restrict the availability of its products in certain regions while still leaving them available in others.</p></li></ul><p>Business interpretation of the chart:</p><ul><li><p>Kering&#8217;s brand Gucci is reducing its presence on Farfetch, in line with rumors. It will need to replace sales generated by Farfetch with the direct-to-consumer website.</p></li><li><p>Other Kering brands are also decreasing, but not (yet) as much. This might be impacted by the share of retail vs wholesale of other brands.</p></li><li><p>Michael Kors is also lowering exposure, but other brands are more or less stable, if not increasing. Burberry has grown its product availability in China.</p></li><li><p>With the decrease of Gucci's presence, Farfetch is losing a major anchor brand, which, on the one hand, leaves room for others and, on the other hand, decreases the attractivity of the platform.</p></li></ul><div><hr></div><h2>How to build this report at home</h2><p>This is a simple yet effective use of data. If you want to build your own, monitor other brands, or track closely the situation as it evolves, here&#8217;s what you can do: Ask the right questions.</p><ul><li><p><strong>Should you do it?</strong> If <em>you or one of your clients</em> work for - or have money invested in -  Farfetch/competitors of Farfetch/brands selling on Farfetch, you probably should. If not, then reading the news will just suffice.</p></li><li><p><strong>What to look for?</strong> My suggestion is to K.I.S.S. (Keep It Simple, Stupid). We want to see who is jumping ship and who&#8217;s staying in. A clear purpose is the best way to extract real value from it.</p></li><li><p><strong>How much should you invest in this (ROI)?</strong> Depends on how much skin in the game you have, but <em>spending as little as possible </em>is always good advice: If Farfetch goes belly up, you won&#8217;t need this tool long after. I would recommend spending a <strong>maximum of 200/300 EUR per month</strong> in data and max 2 hours per week for a member of your team to run this, and you can have your reports in your inbox every Monday by noon (you could automate it way more, according to your needs, but we stick to super basics here).</p></li><li><p><strong>What to monitor?</strong> Stick to the bare minimum: USA, UK, China on a weekly or monthly basis, SKU count, and inventory. That should do the job.</p></li><li><p><strong>What tools to use?</strong> I wish to tell you Excel would be enough, but it&#8217;s quite a bit of data and might become bulky to handle. If you have someone on your team who can handle Excel like a pro, give it a try. MS Access, PowerBI, Tableau, or any of those will do the trick even better.</p></li><li><p><strong>How to do it?</strong> It&#8217;s easier than it looks:</p><ul><li><p>Pick the data <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recrzRBSLBex4CqlD">SKU </a>and <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recd1dSTula6pX3S1">inventory</a></p></li><li><p><strong>Download the files</strong> every week/month (manually or via API)</p></li><li><p><strong>Open the files</strong> with Excel / MS Access / PowerBI / Tableau or whatever you have (automate with API if you know how)</p></li><li><p><strong>Create a pivot table with summary statistics</strong>: Summarize the inventory by brand and country (sum of units) and count the rows (SKU count). </p></li><li><p><strong>Save it on a summary Excel</strong>, which has only summary data, and make it pretty so you can send it via email.</p></li><li><p><strong>Delete the files</strong> you downloaded: It saves space. Don&#8217;t worry, you can always download them again, once purchased they won&#8217;t disappear.</p></li></ul></li></ul><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/buy-data-list-subset/farfetch_web_scraped_data/r/recnDOxe6YYa3C7Ib&quot;,&quot;text&quot;:&quot;Go to data&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/buy-data-list-subset/farfetch_web_scraped_data/r/recnDOxe6YYa3C7Ib"><span>Go to data</span></a></p><p>I hope this hands-on post was helpful.</p><div><hr></div><h2>About the Project</h2><p>Data Boutique aims to increase web data adoption by creating a win-win environment for data sellers and buyers. Join our community. It&#8217;s free. More can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Visit Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Visit Data Boutique</span></a></p><div><hr></div><p>Thanks for reading and helping our community grow.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[Tracking How Bad The Situation At Farfetch Really Is]]></title><description><![CDATA[A case study for web data]]></description><link>https://blog.databoutique.com/p/tracking-how-bad-the-situation-at</link><guid isPermaLink="false">https://blog.databoutique.com/p/tracking-how-bad-the-situation-at</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Fri, 23 Feb 2024 04:59:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!u71c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a data marketplace focused on web scraping. We make it simpler to match those who collect data with those who know how to use it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Tracking How Bad The Situation At Farfetch Really Is</h1><p>We&#8217;ll walk through a highly relevant case study for web data in to consumer goods</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u71c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u71c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 424w, https://substackcdn.com/image/fetch/$s_!u71c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 848w, https://substackcdn.com/image/fetch/$s_!u71c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 1272w, https://substackcdn.com/image/fetch/$s_!u71c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u71c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png" width="1125" height="757" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:757,&quot;width&quot;:1125,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1616871,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u71c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 424w, https://substackcdn.com/image/fetch/$s_!u71c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 848w, https://substackcdn.com/image/fetch/$s_!u71c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 1272w, https://substackcdn.com/image/fetch/$s_!u71c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbee34851-9fc2-4a08-a13b-2108d41c67ec_1125x757.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The 4Bn USD problem</h2><p><a href="https://www.farfetchinvestors.com/financial-news/news-details/2023/Farfetch-Announces-Second-Quarter-2023-Results/default.aspx#:~:text=The%20following%20reflects%20Farfetch's%20expectations,from%20%244.1%20billion%20in%202022">4.4bn USD</a> worth of fashion items were sold through Farfetch in 2023, but many in the industry fear 2024 for the fashion marketplace might mean the end.</p><p>Over the past six months, Farfetch faced significant challenges, including financial difficulties and strategic missteps that led to its <a href="https://www.businesswire.com/news/home/20231219040201/en/NYSE-to-Commence-Delisting-Proceedings-Against-Farfetch-Limited-FTCH">delisting from the NYSE</a>. Following these events, <a href="https://www.businessoffashion.com/news/luxury/neiman-marcus-ends-partnership-with-farfetch/#:~:text=Neiman%20Marcus%20Group%20on%20Wednesday,also%20not%20join%20Farfetch's%20marketplace.">Neiman Marcus has decided against pursuing technological integration with Farfetch</a>, and <a href="https://wwd.com/business-news/retail/kering-pulls-brands-off-farfetch-wake-sale-coupang-1236171102/">Kering announced its intention to remove its brands</a>, including Saint Laurent, Gucci, Bottega Veneta, and Balenciaga, from the platform, which historically has been significant for Farfetch. </p><p>If Farfetch&#8217;s situation were to worsen, several groups of stakeholders would face significant risks: Brands with direct concessions on the platform could lose a vital sales channel, while boutiques might suffer due to their increasing dependency on the online retailer. Investors, including the most recent Coupang, employees, and consumers, also stand to be affected.</p><h2>The speed of information</h2><p>Given the significant interests at stake (4.4Bn USD in sales), it is crucial to understand not only where this is going to end but also how fast this is happening. So that those impacted the most can have the best chance to make business-saving decisions. </p><h2>Why use web data</h2><p><em>Short answer</em>: Because it&#8217;s the best, sometimes the only,  source of information many have.</p><p><em>Long answer</em>: Financial results are slow to be reported (<a href="https://www.farfetchinvestors.com/financial-news/news-details/2023/Farfetch-Will-Not-Announce-Third-Quarter-2023-Results/default.aspx">Farfetch delayed announcing quarterly results before the crisis</a>), insider information is not granted and may be partial, biased, or misleading. </p><p>Credit card transactional data and email receipts are also a great way to track this.. if you work in a hedge fund with deep pockets for data, data scientists, and data infrastructure - but this is not the case for the rest of the class.</p><p>What stands out in the open, for everyone to see, is how the website behaves towards its customers. This is the ultimate touchpoint between the platform and the source of their revenue: customers.</p><h2>What data</h2><h3><strong>Traffic data</strong> </h3><p>This is not strictly web data (at Data Boutique, we are often asked if they can be scraped&#8230; no, they can&#8217;t), so I&#8217;ll put it in. It is extremely relevant and also accessible. <a href="https://www.similarweb.com/">Similarweb</a> is a great place to start. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vDmE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vDmE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 424w, https://substackcdn.com/image/fetch/$s_!vDmE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 848w, https://substackcdn.com/image/fetch/$s_!vDmE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 1272w, https://substackcdn.com/image/fetch/$s_!vDmE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vDmE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png" width="1370" height="564" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:564,&quot;width&quot;:1370,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51181,&quot;alt&quot;:&quot;Similarweb data on farfetch&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Similarweb data on farfetch" title="Similarweb data on farfetch" srcset="https://substackcdn.com/image/fetch/$s_!vDmE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 424w, https://substackcdn.com/image/fetch/$s_!vDmE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 848w, https://substackcdn.com/image/fetch/$s_!vDmE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 1272w, https://substackcdn.com/image/fetch/$s_!vDmE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5edf041-6aba-42d2-8f34-04b82a7a70ec_1370x564.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Similarweb data on Farfetch</figcaption></figure></div><p>Website traffic gives a timely measurement of the footfall of the website, which is critical to have a pulse on how the <em>demand side</em> of the business is responding.</p><h3><strong>Web scraped data</strong> </h3><p>Web scraped data tells us how the <em>supply side</em> of the business is acting. Let&#8217;s see what web data can tell us and where to find it:</p><h4><strong>Brand presence</strong></h4><p>How fast is Kering pulling out from Farfetch? In which geographies do they start first? Where will they leave next? What other fashion groups or brands are and will be next?</p><p>The basic e-commerce data on Farfetch (schema E0001) for the <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recrzRBSLBex4CqlD">UK</a>, <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recyqY2za5wcUsOTU">USA</a>, <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recPTVSfwJB8lHsn0">Europe</a>, <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recuxUHGmwZ5d9Npx">China, </a>and <a href="https://www.databoutique.com/buy-data-page-detail?recordId=rectY7efz3IuPwsZK">Japan</a> are the perfect (and affordable) place to get this info.</p><h4><strong>Discounts</strong></h4><p>When things go south, in the wild e-commerce world, discounting and promotion are often the easiest levers to pull, so being on the watch for when and where this lever will be pulled is key. The problem with discounts also affects Farfetch's competitors since once someone starts, the others are forced to follow, in a domino effect.</p><p>Get the same data mentioned above; it will also cover this.</p><h4><strong>Price discipline</strong></h4><p>Discounting is not the only price hack common in fashion. Going rogue with arbitrages, especially in countries far away and difficult to track, can be challenging. To track this, we will need, in addition to the E0001 datasets mentioned above, the <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recL5NwV1ruqti1JE">E-ADD-CODE-0001 dataset</a>, which provides the original SKU numbers (on Farfetch called Brand ID) that allow cross-platform product matching. </p><h4><strong>Concessions</strong></h4><p>Who are Farfetch merchants? Will they stay, or will they go? <em>When</em> will they jump ship? Although Farfetch does not disclose merchants&#8217; names as clearly as other players like Yoox or Zalando do, it is still visible and accessible from the same E0001 datasets mentioned for discounts and brand presence (if you have any trouble making it work, feel free to reach out). </p><p>The same datasets will serve this as well.</p><h4><strong>Inventory</strong></h4><p>Inventory depth is also very relevant in retail analysis. The newly released <a href="https://www.databoutique.com/buy-data-page-detail?recordId=rechi2MyasIotfU3N">dataset E-INVENTORY</a> for Farfetch covers this topic. Although Farfetch does not own inventory, it shows the depth of stock brands in concession, and merchants are letting Farfetch see. Super hot dataset, used for GMV estimates as well. </p><div><hr></div><h2>Conclusions</h2><p>Whether you work in a large corp with great data analysis capabilities or a small venture, access to web data can offer valuable, timely insights on how this farfetched crisis is evolving.</p><div><hr></div><h2>About the Project</h2><p>Data Boutique aims to increase web data adoption by creating a win-win environment for data sellers and buyers. Join our community. It&#8217;s free. More can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Visit Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Visit Data Boutique</span></a></p><div><hr></div><p>Thanks for reading and helping our community grow.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[The hidden costs of asking for quotations in the data market]]></title><description><![CDATA[Working towards faster, cheaper and safer data projects]]></description><link>https://blog.databoutique.com/p/the-hidden-costs-of-asking-for-quotations</link><guid isPermaLink="false">https://blog.databoutique.com/p/the-hidden-costs-of-asking-for-quotations</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Fri, 09 Feb 2024 15:57:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!YopL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a data marketplace focused on web scraping. We make it simpler to match those who collect data with those who know how to use it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>The hidden costs of asking for quotations in the data market</h1><p>Asking a vendor for a quotation is a back-and-forth hidden-costs-generating process with many inefficiencies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YopL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YopL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!YopL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!YopL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!YopL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YopL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:329434,&quot;alt&quot;:&quot;The hidden costs of the quotation process&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The hidden costs of the quotation process" title="The hidden costs of the quotation process" srcset="https://substackcdn.com/image/fetch/$s_!YopL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!YopL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!YopL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!YopL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f7e05c6-ee9b-4aa6-9fb0-9092667e98d6_1024x1024.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Building complex solutions involves asking vendors for packages that are not priced yet</h2><p>Asking for a quotation in B2B projects is often unavoidable. </p><p>Whenever building new or tailor-made solutions, we need to buy stuff in packages that have not yet been priced, produced, or assembled the way we need them, and data is a frequent case.</p><p>Often, quotations refer to components of larger projects, which also need to provide an estimate.</p><p>Examples: A firm bidding for a tender and asking a vendor for data collection costs; the IT department researching the market after being asked by Marketing for a BI solution; a startup evaluating the economics of different technology stacks to adopt.</p><p>It all revolves around trying to understand the cost of something that has not (yet) a price tag on it.</p><h3>Quotations are a cost for sellers&#8230;</h3><p>The process represents a cost for the sellers: The time spent on poorly written briefs and designing packages to address a request they have little information on ultimately builds up their operating costs.</p><p>Once the quotation is submitted, the waiting for the feedback of the entire decision chain begins, made of reminders, check-in emails, and rescheduled calls.</p><p>Then come scope revisions, alternative scenarios requests, and bulk discounts. Despite this, as often happens, the project might not even be won.</p><p>This is part of the CAC (Customer Acquisition Costs), eventually paid by buyers, as it will be factored into the price.</p><h3>&#8230; and for buyers too</h3><p>Inefficiencies also hit buyers: Time and resources spent writing down the requests, reading and scoring responses, and asking for revisions and alternative scenarios.</p><p>A long quotation-asking process means it takes longer to buy stuff, which means either we buy less frequently (and build fewer solutions) or hire more staff to do it. Both cases kill ROI.</p><p>From whatever perspective you look at this, it&#8217;s a dollar-killing inefficiency.</p><h3>Unbearable for small-size trades</h3><p>While this is acceptable in illiquid markets (as illiquid as a once-every-five-year-million-dollar deal), it becomes an unbearable friction for small to mid-size trades (hundreds or a few thousand dollars multiple times a year). </p><p>Web scraping data - the business we&#8217;re in - falls in the latter case.</p><div><hr></div><h2>How we solved it: Schemas and Bundles</h2><p>Since our mission is to make access to data faster, cheaper, and safer, solving this money-burning problem was a priority.</p><p>What we did was the following:</p><ol><li><p>Standardizing the building blocks</p></li><li><p>Empowering buyers to play before they buy</p></li></ol><div><hr></div><h3>1. The building blocks (Schemas)</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kWDm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kWDm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!kWDm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!kWDm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!kWDm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kWDm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256498,&quot;alt&quot;:&quot;Simple data blocks are the foundation of Data Boutique&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Simple data blocks are the foundation of Data Boutique" title="Simple data blocks are the foundation of Data Boutique" srcset="https://substackcdn.com/image/fetch/$s_!kWDm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!kWDm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!kWDm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!kWDm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e94a359-04b4-4440-81fe-b894c39245b3_1024x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Simple data blocks are the foundation of Data Boutique</figcaption></figure></div><p>Simplifying starts from the foundations: To make the process smoother, we must identify simple, modular building blocks for our data use cases. The tricky part is to simultaneously be simple <em>and</em> valuable.</p><p>Standard schemas provided the answer: Buyer-oriented data structures that serve &#8220;atomic&#8221; data needs. More complex solutions can be delivered just by assembling more schemas together.</p><p>Working in web scraped data made this part easier: Websites usually come in standardizable categories, such as e-commerce, classifieds, store locators, travel and booking, and so on.</p><p>When you buy an &#8220;E0001 Schema&#8221; you know what you get, regardless of the website. This took so much complexity from the equation and made life easier for everyone: Buyers <em>and</em> sellers.</p><div><hr></div><h3>2. Play before you buy (Private Bundles)</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WAwK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WAwK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!WAwK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!WAwK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!WAwK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WAwK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:269986,&quot;alt&quot;:&quot;Play before you buy with private bundles&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Play before you buy with private bundles" title="Play before you buy with private bundles" srcset="https://substackcdn.com/image/fetch/$s_!WAwK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 424w, https://substackcdn.com/image/fetch/$s_!WAwK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 848w, https://substackcdn.com/image/fetch/$s_!WAwK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 1272w, https://substackcdn.com/image/fetch/$s_!WAwK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa357008e-fb2f-4082-a381-2fbaeb8cb209_1024x1024.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once we had modularity, creating complex configurations was its natural evolution:</p><p>Data requests like &#8220;<em>a dataset to monitor daily prices and promotions of H&amp;M products in France</em>&#8221; or &#8220;<em>a dataset to measure inflation in Germany, the UK, the USA, Canada, China, and Japan for all Kering brands</em>&#8221; is an exercise of assembly.</p><p>Inspired by tools like the <a href="https://calculator.aws/#/">AWS Pricing Calculator</a>, we created <strong>Private Bundles</strong>, a feature where users can simulate their data packages.</p><p>With <strong>Private Bundles</strong>, users can change the <strong>scope</strong> and <strong>frequency</strong>, assemble atomic blocks as they need, <strong>save</strong> their selections, and <strong>share</strong> them with their team, with no commitment to buy. </p><blockquote><p><em>Private Bundles are a safe environment where users can experiment different ideas before committing to buy</em></p></blockquote><p>A Private Bundle enables to:</p><ul><li><p><strong>Start planning on initial ideas</strong> and saving them for later, defining data specs as you go and keeping draft versions as more information comes in;</p></li><li><p><strong>Have a real-time cost projection</strong>, and see how this changes when you choose different data types, vendors, data refresh frequency, and content;</p></li><li><p>Factor in <strong>volume discounts</strong>, as they are automatically inserted in the calculation;</p></li><li><p><strong>Contribute with other team members</strong> to its definition, <strong>sharing the final draft with the decision-makers</strong>. </p></li></ul><h3>Improve the speed of thinking data projects</h3><p>The speed of project design improves significantly as a buyer can formulate, correct, and rephrase their project brief directly on the platform, obtaining a real-time cost quotation.</p><p>To <strong>save even more time</strong>, you can start from a Public Bundle, clone it, and work from there: <a href="https://www.databoutique.com/buy-bundle-detail/fast+fashion+usa+weekly/r/reczmu7XrXkmORe1C">Example of a public bundle</a>.</p><p>Often, buyers need to evaluate multiple scenarios, inclusive of data costs, before they can make a decision. </p><p>We make this process faster.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Go to Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Go to Data Boutique</span></a></p><div><hr></div><h2>About the Project</h2><p>Data Boutique aims to increase web data adoption by creating a win-win environment for data sellers and buyers. Join our community, it&#8217;s free. More can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Visit Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Visit Data Boutique</span></a></p><div><hr></div><p>Thanks for reading and helping our community grow.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[What's New on Data Boutique]]></title><description><![CDATA[Noteworthy stuff from the web data platform]]></description><link>https://blog.databoutique.com/p/whats-new-on-data-boutique</link><guid isPermaLink="false">https://blog.databoutique.com/p/whats-new-on-data-boutique</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Tue, 23 Jan 2024 05:01:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!82Rn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a data marketplace focused on web scraping. We bring together those who collect data with those who know how to use it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>What&#8217;s New on Data Boutique</h1><p><em>PREFACE TO THIS EDITION</em></p><blockquote><p><em>This edition opens a new series of posts on relevant content shared on the web-scraped data exchange during the past month.</em></p></blockquote><div><hr></div><h5>Table of Contents</h5><h5>1.For data buyers</h5><ul><li><p><strong>Case Study</strong>: How to match products on Farfetch</p></li><li><p><strong>New Datasets</strong>: Cosmetics on Sephora</p></li><li><p><strong>The Cost of Data</strong>: Asked price stats by schema</p></li></ul><p></p><h5>2.For data sellers</h5><ul><li><p><strong>Feature</strong>: Your Profile Pages on Data Boutique</p></li><li><p><strong>Tips</strong>: Automating Upload and Validations on the Data Boutique S3 Bucket</p></li></ul><div><hr></div><h2>1. For data buyers</h2><h3>Case Study: How to match products on Farfetch</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!82Rn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!82Rn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 424w, https://substackcdn.com/image/fetch/$s_!82Rn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 848w, https://substackcdn.com/image/fetch/$s_!82Rn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 1272w, https://substackcdn.com/image/fetch/$s_!82Rn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!82Rn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png" width="462" height="434.29065743944636" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:867,&quot;resizeWidth&quot;:462,&quot;bytes&quot;:146171,&quot;alt&quot;:&quot;Farfetch ID brand code&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Farfetch ID brand code" title="Farfetch ID brand code" srcset="https://substackcdn.com/image/fetch/$s_!82Rn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 424w, https://substackcdn.com/image/fetch/$s_!82Rn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 848w, https://substackcdn.com/image/fetch/$s_!82Rn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 1272w, https://substackcdn.com/image/fetch/$s_!82Rn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18788a54-0065-4e57-bfbd-c21678a9ae48_867x815.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Farfetch ID brand code</figcaption></figure></div><p><a href="https://www.farfetch.com/">Farfetch </a>is arguably the most relevant marketplace in the online fashion and luxury space, and price and promotion monitoring on this website is highly relevant.</p><p>In price monitoring, matching products across retailers is required to measure potential price misalignments between platforms. Multibrand retailers typically use internal product codes, as the apparel industry rarely uses standard codes, such as EAN or GTIN, making the job harder. </p><p>Farfetch exposes an additional code, the &#8220;ID Brand&#8221; code, the closest thing we have to a unique product identifier (SKU). We have a <a href="https://www.databoutique.com/buy-schema-detail?recordId=recrfwYdtTJXY3KbW">dedicated schema</a> to capture any additional code e-commerce websites may have: The E-ADD-CODE-0001 schema.</p><h4>Why use the ID Brand code</h4><p>It makes price comparison with other websites easy, especially official brand websites. You can use it to match Farfetch and other websites (that you can find on Data Boutique), or with additional product list files you may have.</p><h4>How to use the E-ADD-CODE-0001 schema</h4><p>The E-ADD-CODE-0001 schema is essentially a lookup/cross-reference table: It contains the list of product codes used in other schemas, such as E0001, and associates it with the &#8220;ID Brand&#8221; code. </p><p>In <a href="https://www.farfetch.com/it/shopping/women/moschino-mocassini-chunky-con-logo-item-20267286.aspx">this example</a>, it associates the product_code 20267286 with the ID brand MA10554G0HMJ5. </p><h4>The importance of having a separate schema</h4><p>Why a separate schema? Two reasons:</p><ol><li><p><strong>Cost</strong>: ID Brand codes are visible in Product Detail Pages (PDP), which are more costly to scrape. By keeping it separate, we keep the price information in a cost-efficient schema. The E-ADD-CODE-0001 schema can be accessed in combination with an E0001 schema (plain data on prices), the first just once every month or less, the latter more frequently, dropping the cost factor roughly by an order of magnitude.</p></li><li><p><strong>Easier to find, easier to request</strong>: With a separate schema it&#8217;s easier to spot when a website has this information available, with no need to investigate the sample file. A buyer will have immediate visibility, and when missing, they can request an E-ADD-CODE-0001 schema for that website; no further specification is needed. <strong>Data discovery</strong> gets a lot smoother this way.</p><p> </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/buy-data-page-detail?recordId=recL5NwV1ruqti1JE&quot;,&quot;text&quot;:&quot;Go to This Dataset&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/buy-data-page-detail?recordId=recL5NwV1ruqti1JE"><span>Go to This Dataset</span></a></p></li></ol><div><hr></div><h3>New Datasets: Cosmetics on Sephora</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!taAj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!taAj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 424w, https://substackcdn.com/image/fetch/$s_!taAj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 848w, https://substackcdn.com/image/fetch/$s_!taAj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!taAj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!taAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg" width="514" height="343.095" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:801,&quot;width&quot;:1200,&quot;resizeWidth&quot;:514,&quot;bytes&quot;:225317,&quot;alt&quot;:&quot;Sephora store&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sephora store" title="Sephora store" srcset="https://substackcdn.com/image/fetch/$s_!taAj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 424w, https://substackcdn.com/image/fetch/$s_!taAj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 848w, https://substackcdn.com/image/fetch/$s_!taAj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!taAj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b2cdb26-8606-4c05-9007-e666975b0cd7_1200x801.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Sephora store</figcaption></figure></div><p>The cosmetics industry is a hot sector for web scraping. The first dataset appearing on Data Boutique refers to the retailer <strong><a href="http://Sephora was acquired by LVMH">Sephora</a></strong>, the leading beauty and cosmetics retailer owned by LVMH. Sephora offers beauty, skincare, and fragrance products from various luxury brands.</p><p>The dataset currently lists product prices for <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recvKsTvBJUTzWsPR">France </a>and <a href="https://www.databoutique.com/buy-data-page-detail?recordId=recjhS3DSQL5eqyBj">Poland </a>and is offered by <a href="https://www.databoutique.com/seller-page/WebDataWatch/r/recPcRakqOxMhHlCa">WebDataWatch</a>, one of the most reliable data sellers on Data Boutique.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/buy-data-page-detail?recordId=recvKsTvBJUTzWsPR&quot;,&quot;text&quot;:&quot;Go to This Dataset&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/buy-data-page-detail?recordId=recvKsTvBJUTzWsPR"><span>Go to This Dataset</span></a></p><div><hr></div><h3><strong>The Cost of Data</strong>: Asked price stats by schema</h3><p><strong>The data price</strong> on Data Boutique is set by buyers and sellers simultaneously. The factors that influence price are:</p><ul><li><p>Cost of extraction</p></li><li><p>Demand volume and stability</p></li><li><p>Reputation of seller</p></li></ul><p>Since the cost of extraction varies significantly based on the website and the depth of scraping required, it is helpful to look at price statistics by schema.</p><p>Here is a snapshot of what prices look like today on Data Boutique.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bYdm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bYdm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 424w, https://substackcdn.com/image/fetch/$s_!bYdm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 848w, https://substackcdn.com/image/fetch/$s_!bYdm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 1272w, https://substackcdn.com/image/fetch/$s_!bYdm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bYdm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png" width="616" height="610.0769230769231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1442,&quot;width&quot;:1456,&quot;resizeWidth&quot;:616,&quot;bytes&quot;:62723,&quot;alt&quot;:&quot;Asked prices for full-website scans by selected schemas on DataBoutique.com, January 2024&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Asked prices for full-website scans by selected schemas on DataBoutique.com, January 2024" title="Asked prices for full-website scans by selected schemas on DataBoutique.com, January 2024" srcset="https://substackcdn.com/image/fetch/$s_!bYdm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 424w, https://substackcdn.com/image/fetch/$s_!bYdm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 848w, https://substackcdn.com/image/fetch/$s_!bYdm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 1272w, https://substackcdn.com/image/fetch/$s_!bYdm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d439f8-107a-4295-a92e-0000a7789e2a_2636x2611.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Asked prices for full-website scans by selected schemas on DataBoutique.com, January 2024</figcaption></figure></div><p></p><div><hr></div><h2>2. For data sellers</h2><h3><strong>Feature</strong>: Your Profile Pages on Data Boutique</h3><p>Starting this month, sellers can have a <strong><a href="https://www.databoutique.com/sellers-list">profile page</a></strong> hosted on Data Boutique, where all datasets offered are grouped together. </p><p>We want data providers to be able to showcase their work and attract leads. This can translate into job opportunities inside and outside of Data Boutique.</p><p>Profile Pages have the purpose of:</p><ul><li><p>Establish the seller&#8217;s brand</p></li><li><p>Showcase their work</p></li><li><p>Enable lead generation</p></li><li><p>Have a space to share relevant news, marketing material, and updates.</p></li></ul><h4>How to activate the profile page</h4><p>Since there is a lot of activity on the platform, these indications may change in the future (so just be aware of this if you are reading this post in the future):</p><p>Once logged-in (sign-up is free), from the top menu bar pick &#8220;Selling&#8221; and then &#8220;My Company Profile&#8221;.</p><p>From here you can handle your public messages, you can edit the information displayed on the Profile Page, and most importantly you can change the settings to:</p><ul><li><p>Make your Profile Page public</p></li><li><p>Accept direct messages from any Data Boutique user</p></li></ul><p>You are free to turn on/off any of these options.</p><div><hr></div><h3><strong>Tips</strong>: Automating Upload and Validations on the Data Boutique S3 Bucket</h3><p>When selling data on Data Boutique, data providers need to upload their files on an S3 bucket. An automatic validator tests the content and returns an approved/refused response.</p><p>A huge shoutout goes to <a href="https://www.databoutique.com/seller-page?recordId=recoN5G7UEpH0cSwB">Riccardo Lunardi</a>, a seller from Italy, who shared some Python code to automate this process. Riccardo created a piece of code (class) that can update files to the contacts and check for their approval (check the article to know how). Thanks a lot, Riccardo&#129303;.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/news?recordId=recvQgYRSuUFZYQvX&quot;,&quot;text&quot;:&quot;Go to Guide&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/news?recordId=recvQgYRSuUFZYQvX"><span>Go to Guide</span></a></p><div><hr></div><h2>About the Project</h2><p>Data Boutique aims to increase web data adoption, by creating a win-win environment for data sellers and buyers. Join our community (its free!) to be part of it. More can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Visit Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Visit Data Boutique</span></a></p><div><hr></div><p>That was it for this month. Thanks for reading and helping our community grow.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[Generic data marketplaces are broken]]></title><description><![CDATA[Misaligned interests and lack of value are the cause]]></description><link>https://blog.databoutique.com/p/generic-data-marketplaces-are-broken</link><guid isPermaLink="false">https://blog.databoutique.com/p/generic-data-marketplaces-are-broken</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Mon, 27 Nov 2023 04:54:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SsvV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace. Unlike generic-purpose ones, this addresses specific, well-defined needs in a transparent, interest-aligned space.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>Generic data marketplaces are broken</h1><p>Four years have passed since <a href="https://aws.amazon.com/it/about-aws/whats-new/2019/11/introducing-aws-data-exchange/">Amazon launched AWS Data Exchange</a>, and not AWS nor any other player proved to be that accelerating agent in the growth of the segment everyone hoped for.</p><p>As of today, the coverage of datasets is minimal: </p><ul><li><p><a href="https://aws.amazon.com/it/data-exchange/">AWS Data Exchange</a> lists 4.109 data products</p></li><li><p><a href="https://app.snowflake.com/marketplace">Snowflake Marketplace</a> lists 2.427 data products</p></li><li><p><a href="https://marketplace.databricks.com/providers">Databricks Marketplace</a> has 143 providers (which include solutions providers as well, not only data providers).</p></li></ul><p>Far off a market that&#8217;s supposed to be &#8220;the new oil&#8221;.</p><p>Marketplaces fail to capture a significant part of the exchanges: According to Grand View Research, the<a href="https://www.grandviewresearch.com/industry-analysis/data-marketplace-market-report"> Data Marketplace global market size</a> in 2023 was <strong>1.2 billion</strong> USD: The <a href="https://www.grandviewresearch.com/industry-analysis/alternative-data-market">Alternative Data Market </a>- a sub-niche of data - was 6 times bigger (<strong>7.2 billion</strong> USD), and Web Scraping 5 times that much (<strong>6 billion</strong> USD). </p><p>Cross-referring to this research, marketplaces capture less than 2% of the exchanged volumes.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SsvV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SsvV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 424w, https://substackcdn.com/image/fetch/$s_!SsvV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 848w, https://substackcdn.com/image/fetch/$s_!SsvV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 1272w, https://substackcdn.com/image/fetch/$s_!SsvV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SsvV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png" width="441" height="563.8450704225352" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:639,&quot;resizeWidth&quot;:441,&quot;bytes&quot;:15586,&quot;alt&quot;:&quot;data marketplace market share&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="data marketplace market share" title="data marketplace market share" srcset="https://substackcdn.com/image/fetch/$s_!SsvV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 424w, https://substackcdn.com/image/fetch/$s_!SsvV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 848w, https://substackcdn.com/image/fetch/$s_!SsvV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 1272w, https://substackcdn.com/image/fetch/$s_!SsvV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec473be9-3c42-4fe5-82af-bd0a65dba7c9_639x817.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">volume captured by data marketplaces</figcaption></figure></div><p>.</p><div><hr></div><h2>What&#8217;s the problem here?</h2><p>Large and well-funded corporations have tried to address this, so why hasn&#8217;t it worked out yet? </p><p>As my <a href="https://www.danhock.co/p/transaction-costs">all-time most-quoted essay</a> reads, <em>the role of marketplaces is to lower transaction costs</em>. </p><p>Marketplaces are just as powerful as the value they bring to the table. And that value has not been enough yet.</p><h3>The marketplace is pretextual to platforms: interests are misaligned</h3><p>Data is platform-agnostic. But those who built major data marketplaces today are not: Their intent is to push their technologies, not data commerce.</p><p>In fact:</p><ul><li><p>AWS Data Exchange wants you to use the AWS stack</p></li><li><p>Snowflake Data Marketplace works as long as you are a Snowflake user</p></li><li><p>Databricks, Tableau, Qlik, SAP, Informatica, and others all are in support of a specific technology</p></li></ul><p>They get dollars when a user uses the platform, but it makes no difference if a buyer purchases external data products or uses their own. They have zero incentive to optimize for a sale. They won&#8217;t invest in it more than the bare minimum, and that explains the following point.</p><h3>Search is not the problem.</h3><p>Generic data marketplaces are just a collection of providers with little more than search features.</p><p>But if we were to break down the transaction costs for data (search, evaluation, negotiation, price, and ingestion), the search component turns out to be just a minor element.</p><p>On top of a 5k USD- often 10X this much - data contract, as a buyer you&#8217;d still need to enter the website of the vendor, call their sales team, get a quote, negotiate a sort of trial, and onboard the data - each time with a different vendor. </p><p><em>Finding</em> the vendor is hardly the hard part. </p><p>There is simply too much work to be done off-platform, talking and negotiating with the vendor. Actually, there is so much work that the role played by the marketplace (search) is not that different than those of search engines with SEO.</p><p>Understanding the price, understanding what data I am buying, and comparing that data with alternatives is the problem.</p><div><hr></div><h2>What is needed to change</h2><p>Marketplaces are trading all data like a gigantic blob inside a chaotic bazaar where sellers scream their best offers. This doesn&#8217;t instill trust.</p><h3>1. Align interests</h3><p>Marketplaces that have skin in the game will work for the benefit of buyers and sellers. When interests are aligned, all parties involved will profit only when the exchanges on the platform take off. </p><p>A &#8220;take rate&#8221; business model is the one that provides the best alignment of interests with everyone, buyers included. When the platform has stakes in a deal, you can be sure they&#8217;ll optimize for it.</p><h3>2. Make pricing clear</h3><p>The bull must be taken by the horns: Price is the most critical element in B2B negotiations. Today&#8217;s data market is a closed-door negotiation place, making the market illiquid. </p><p>This is the contrary of a marketplace, where a more extensive, prosperous economy flourishes because it is liquid, and goods are bought and sold rapidly.</p><p>The price for an exact service needs to be visible and comparable. &#8220;Free plan available&#8221; or &#8220;Subscription options available &#8220; won&#8217;t work.</p><p>Price needs to be a transparent, outspoken element, allowing for comparison among like-for-like products. </p><h3>3. Address the transaction pain points</h3><p>A platform needs to drive faster and more frequent (liquid) transactions. To achieve this, we believe a vertical specialization is needed. We picked web scraping because it&#8217;s a market we understand very well, and we optimized Data Boutique for it.</p><p>The advantage of being vertically specialized is that it allows one to enter the specifics of the purchase journey, from discovery to evaluation, to purchase, to fruition, to after-purchase support.</p><p>Let&#8217;s take web scraping: Data Quality is a big headache for the buyer. Since we only do this kind of data, we can have a given set of quality checks before the data product is listed.</p><p>This way prices become comparable, we can have a like-for-like benchmark, and the purchase decision can be made faster. </p><p>This is what buyers and sellers look for and the ultimate end goal of a marketplace: More sales.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h3>Annex - screenshots I took so you know I didn&#8217;t make this up</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OpnJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OpnJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 424w, https://substackcdn.com/image/fetch/$s_!OpnJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 848w, https://substackcdn.com/image/fetch/$s_!OpnJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 1272w, https://substackcdn.com/image/fetch/$s_!OpnJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OpnJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png" width="408" height="449.39130434782606" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:608,&quot;width&quot;:552,&quot;resizeWidth&quot;:408,&quot;bytes&quot;:55719,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OpnJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 424w, https://substackcdn.com/image/fetch/$s_!OpnJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 848w, https://substackcdn.com/image/fetch/$s_!OpnJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 1272w, https://substackcdn.com/image/fetch/$s_!OpnJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b7184a-4383-46db-a70d-5f3c06a765e5_552x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">AWS Data Exchange data products</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!11FP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!11FP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 424w, https://substackcdn.com/image/fetch/$s_!11FP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 848w, https://substackcdn.com/image/fetch/$s_!11FP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 1272w, https://substackcdn.com/image/fetch/$s_!11FP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!11FP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png" width="316" height="170.50833333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:259,&quot;width&quot;:480,&quot;resizeWidth&quot;:316,&quot;bytes&quot;:17580,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!11FP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 424w, https://substackcdn.com/image/fetch/$s_!11FP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 848w, https://substackcdn.com/image/fetch/$s_!11FP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 1272w, https://substackcdn.com/image/fetch/$s_!11FP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4452c7e3-57c7-4e9f-9789-8767057bca18_480x259.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Snowflake Data marketplace data products</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nA_4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nA_4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 424w, https://substackcdn.com/image/fetch/$s_!nA_4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 848w, https://substackcdn.com/image/fetch/$s_!nA_4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 1272w, https://substackcdn.com/image/fetch/$s_!nA_4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nA_4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png" width="332" height="186.3931203931204" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:457,&quot;width&quot;:814,&quot;resizeWidth&quot;:332,&quot;bytes&quot;:57635,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nA_4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 424w, https://substackcdn.com/image/fetch/$s_!nA_4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 848w, https://substackcdn.com/image/fetch/$s_!nA_4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 1272w, https://substackcdn.com/image/fetch/$s_!nA_4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffad500ca-2784-46cf-84dd-2e9c06392f5f_814x457.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Web Scraping data market</figcaption></figure></div><p></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>About the Project</h2><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[The Impact of ChatGPT on the Web Scraping Industry]]></title><description><![CDATA[Access to live web data is just the tip of the iceberg, advanced analytics and enterprise model push web data adoption]]></description><link>https://blog.databoutique.com/p/the-impact-of-chatgpt-on-the-web</link><guid isPermaLink="false">https://blog.databoutique.com/p/the-impact-of-chatgpt-on-the-web</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Thu, 02 Nov 2023 05:42:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!fCT5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace: Because the smartest way to get this data is to ask those who already collect it.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>The Impact of ChatGPT on the Web Scraping Industry</h1><p>November 2023 marks one year since the public release of ChatGPT. The model is trained on large-scale web-scraped data and can now access the internet in real-time. </p><p>Given the latest trends, what are the impacts on this data economy?</p><blockquote><p><em>tl;dr: ChatGPT doesn&#8217;t get rid of the need for web data, but it speeds up how you get it, and how you get insights out of it.</em></p></blockquote><h2>Does ChatGPT solve the web data need?</h2><p>Do we still need to extract data from the web when we can ask ChatGPT? </p><p>Unfortunately, while ChatGPT has access to <em>a lot</em> of data, it doesn&#8217;t have access to <em>all the data</em> you need, not in the format you need, and wouldn&#8217;t deliver it in the way you need for a large-scale web-scraping use case.</p><p>Think of applications like revenue optimization for hotels: Features like <a href="https://help.openai.com/en/articles/8077698-how-do-i-use-chatgpt-browse-with-bing-to-search-the-web">browse with Bing</a> or the <a href="https://www.kayak.com/news/kayak-chatgpt/">Kayak plugin</a> do offer some relief to this, but all efforts hit a wall when trying to build something intrinsically more robust than search engine results.</p><p>When researching the fashion industry, the <a href="https://roihacks.com/shein-discovery-chatgpt-plugin/">Shein discovery ChatGPT plugin</a> offers some basic level results, but its primary purpose is to make you shop on <a href="https://www.shein.com/">Shein</a>, not help craft your brand strategy.</p><p>When building your own LLM, or a market intel SaaS, you need data extraction from the web - and ChatGPT&#8217;s help can go a long way:</p><ol><li><p>It can help with the extraction problem (data supply), by making it faster</p></li><li><p>It can help us once we have it (data demand) to make sense of it</p><p></p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fCT5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fCT5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!fCT5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!fCT5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!fCT5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fCT5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:652238,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fCT5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!fCT5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!fCT5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!fCT5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F208c9374-aae8-49f7-9244-fe843e4104d2_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Supplying the data</h2><p>Without entering details, systematically extracting data from websites burns resources in two ways: </p><ol><li><p><strong>Time</strong> - writing the code, executing it, and checking the results for completeness absorbs FTEs pretty heavily, and it&#8217;s getting more and more complicated than it was just 5 years ago. The most innovative applications use generative AI to save <strong>time</strong> by making the writing of the code faster, assessing quality issues on the retrieved data for easier code maintenance, and reducing the time to market of the software to keep it in high reliability. A private version of the model (like the ChatGPT enterprise plan) can help build proprietary assets.</p><p></p></li><li><p><strong>Money</strong> - websites often have costly <em>anti</em>-scrape measures that require costly <em>anti-anti</em>-scrape countermeasures. Little can be achieved on this side of the problem with generative AI. It can teach you how to fish, but it still remains a damn expensive sport. </p></li></ol><h2>Using the data (demand)</h2><p>The use of AI in business intelligence has long been a feature that was explored by many platforms, yet with not exceptionally satisfying results. </p><p>Today ChatGPT <strong>Advanced Data Analysis</strong> feature (ex code interpreter) and plugins like <a href="https://noteable.io/chatgpt-plugin-for-notebook/">Noteable</a> allow you to get a lot done on this side:</p><ul><li><p>You can upload your data (the web data you just collected or has someone collect for you) or connect with your database (maybe a better choice since <a href="https://help.openai.com/en/articles/8437071-advanced-data-analysis-chatgpt-enterprise-version">limitations on upload</a> barely hit the definition of &#8220;big data&#8221;)</p></li><li><p>Interact directly with data through textual prompts, which is very useful, especially on first-time onboarding of datasets and quality assurance and ingestion process design.</p></li></ul><p>The current state of these technologies works better in interaction with established technology stacks of Data Lakes/Data Warehousing or Business Intelligence tools like <a href="https://powerbi.microsoft.com/">PowerBI </a>or <a href="https://www.tableau.com/">Tableau</a> and will likely serve as co-pilots rather than replace them.</p><p>The main advantages we have seen are:</p><ol><li><p>Boost in analytics capacity for one-off analysis, like an M&amp;A due diligence - traditionally incompatible with the timeline of building a custom BI project -  allowing to process larger amounts of data way faster</p></li><li><p>Faster  data onboarding on long-term projects, which, in addition to private versions of the model like ChatGPT Enterprise, allows the use of internal data and building proprietary models.</p></li></ol><h2>Using the data to train LLMs (more demand)</h2><p>Finally, a word on the most data-hungry applications in town: LLMs. Maybe the strongest effect we have seen in web scraping was the rise of new LLMs inspired by ChatGPT.</p><p>Training models is a data-devouring activity, which for many models means <a href="https://medium.com/@michaelnau.dev/web-scraping-for-llms-a66818950b26">scraping</a>, <a href="https://www.youtube.com/watch?v=8uvHH-ocSes">scraping</a>, and again <a href="https://towardsai.net/p/machine-learning/data-scraping-in-the-spotlight-are-language-models-overstepping-by-training-on-everyones-content">scraping</a>. Once trained, the models still need data to process, be refined, and (again) trained.</p><p>Since the launch of <a href="https://www.databoutique.com/">our marketplace</a>, the share of AI applications in our user base has kept growing: From early-stage startups to established SaaS, new requests for high-frequency, high-volume, billion rows pricing/product/URL data feeds have only increased. </p><p>The need for data is so intense that they would be spending more time fixing scrapers than working on the AI, had they decided not to source the data elsewhere - as one user told me, <em>&#8220;We are an AI company, not a web scraping company!&#8221;</em>.</p><h2>Conclusions</h2><p>We have seen a growth in demand <em>and</em> supply, triggered as well by the advent of large language models and by the growth in stakes in the digital economy.</p><p>But web scraping is often kept under cover (&#8220;<em>The first rule of web scraping is: you do not talk about web scraping</em>&#8221; is the opening line of the <a href="https://www.reddit.com/r/webscraping/">webscraping subreddit</a>), and it is still living its wild-west moment, where everyone is out on their own, making it difficult to measure and compare.</p><p>A lot has been done on the software side, but too little on its fuel (data). Join our project if you want to make web scraping more efficient.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>About the Project</h2><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[The Hard Economics of Selling Web Data]]></title><description><![CDATA[Market components to consider when selling pre-scraped datasets]]></description><link>https://blog.databoutique.com/p/the-hard-economics-of-selling-web</link><guid isPermaLink="false">https://blog.databoutique.com/p/the-hard-economics-of-selling-web</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Wed, 11 Oct 2023 04:33:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8uwp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace. If you&#8217;re looking for web data, there is a high chance someone is already collecting it. <a href="https://www.databoutique.com/">Data Boutique</a> makes it easy and safe to buy data from them.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h1>The Hard Economics of Selling Data</h1><p>As already seen, <a href="https://blog.databoutique.com/p/scrape-or-buy-the-zalando-case-study">it would make enormous sense to buy pre-scraped data</a> instead of building a new code from scratch. Yet, many efforts that have been made in the past to sell datasets didn&#8217;t catch up.</p><p>Why is that? Why do companies hire or commission external consultants to scrape rather than search for pre-scraped data? Why is it build preferred to buy?</p><p>Selling data independently can be hard, as unit economics pull against it. But things look differently when we understand the market.</p><h3>Web scraping is a commodity</h3><p><a href="https://blog.databoutique.com/p/web-data-is-a-commodity-and-needs">We talked about this here</a>: Since it would be feasible for me to hire someone to code a scrape for a fair share of websites, web scraping can be considered a commodity. </p><p>In other words, buyers have alternatives, and pricing datasets right might not be that easy.</p><h3>The price trap</h3><p>Price options are constrained, in fact. We can&#8217;t ask for too much, as few would buy (they have too many alternatives), but we can&#8217;t drop the price too much to sell it to more people, because it would be anti-economical (it costs too much for us to reach that customer, and cheap products might <a href="https://blog.databoutique.com/p/expensive-data-and-cannibalism">cannibalize existing ones</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8uwp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8uwp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!8uwp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!8uwp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!8uwp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8uwp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:275349,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8uwp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!8uwp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!8uwp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!8uwp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c5153b1-4e8d-4668-ab6d-9856661afe6d_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Not Worth Buying: The Cost of Alternatives</h3><p>Some data are unique, and others are a commodity. The commodity, by definition, has trouble in being priced too high. Every buyer will in fact consider the cost of alternatives.</p><p>This cost varies from buyer to buyer, but we can consider it the lowest of the following:</p><ul><li><p>Cost and time of learning web scraping internally</p></li><li><p>Cost and time of hiring someone who can web scrape</p></li><li><p>Cost and time of commissioning the web scraping to a third party</p></li><li><p>Loss/ damage for not pursuing web scraping at all </p></li></ul><p>This caps the maximum price a data seller can ask, as there is so little liquidity (so few buyers) at high prices.</p><h3>Not worth Selling: Customer Acquisition Cost (CAC)</h3><p>Since data can&#8217;t be sold for a lot, trying the VOLUME strategy, (low prices for many) seems the natural alternative. </p><p>However, sellers are not incentivized to pursue it, as they&#8217;d be operating at a loss due to high Customer Acquisition Costs (CAC): The distance a buyer will have to travel to find a pre-scraped dataset is longer than the distance to find a web-scraping expert who can do the work.</p><p>A stalemate. Where everybody loses, as the information market, which could trigger so many use cases, can&#8217;t find its way out.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GrjX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GrjX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!GrjX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!GrjX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!GrjX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GrjX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:418428,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GrjX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!GrjX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!GrjX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!GrjX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf49af23-35fa-4fea-9136-bb543d5d4955_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><h1>How Marketplaces change this</h1><p>The good news is, if there wasn&#8217;t a way out of this, humanity would still be stuck with in-house agriculture, as this applies to any generally available technology. </p><p>The model that humanity found to solve this stalemate is the marketplace. The - once physical now digital - place where buyers go and meet multiple sellers to do more, diverse purchases.</p><p>Trading goods and services in a structured marketplace has two major effects that allow the market to get out of the price trap and access the benefits of pre-scraped data:</p><h3>1. Lowering the transaction cost</h3><p>As stated by <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Dan Hockenmaier&quot;,&quot;id&quot;:2607892,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e1bdeea-74f6-48f7-8d3f-d42dc03bbc08_1080x1080.png&quot;,&quot;uuid&quot;:&quot;b2bbe8d5-478b-4222-81af-fa577f49e851&quot;}" data-component-name="MentionToDOM"></span> in his <a href="https://www.danhock.co/p/transaction-costs">brilliant essay on marketplaces</a>, marketplaces' purpose is to lower the buyer's effort (transaction cost). </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eZAw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eZAw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!eZAw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!eZAw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!eZAw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eZAw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:235533,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eZAw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!eZAw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!eZAw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!eZAw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce568652-0580-47c7-a012-f46e6f2fe494_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When considering web scraped data, the buyer&#8217;s effort can be broken down into:</p><ol><li><p><strong>Search</strong>: The effort it takes to find a reliable data provider. Among all the features a data marketplace covers, this is the bare minimum: Providing a decent UI to find data providers.</p></li><li><p><strong>Auditing</strong>: The effort (time and cost) it takes to test the result and ensure we can use it. Since web data has legal and quality implications of its own, a dedicated marketplace for web data makes sense. We have been working a lot on this point at <a href="https://www.databoutique.com/">Data Boutique</a>, as auditing (quality and legitimacy) is often a deal-breaker.</p></li><li><p><strong>Negotiation</strong>: The effort it takes to understand and negotiate the conditions for the sale. Another big one. When a buyer can seamlessly test, buy, and refresh data from multiple vendors under the same roof of Terms and Conditions, it&#8217;s like removing roadblocks with a bulldozer. You realize an industry has a price issue when prices are never displayed upfront. We have always pursued transparency, as it only speeds up transactions at the end of the day.</p></li><li><p><strong>Price paid</strong>: The actual price paid. We don&#8217;t believe the price is too high for web data. We believe the pricing model, inherited from consulting, is broken. We firmly believe consumption-based pricing is the best way to align buyer&#8217;s and seller&#8217;s interests.</p></li><li><p><strong>Execution</strong>: The effort it takes to actually ingest the data and make something useful out of it. Almost every data marketplace handles this, mainly because the interest of captive marketplaces is to use the other services, not the marketplace itself (Snowflake, Databricks, Tableau, Qlik, etc.)</p></li></ol><p>When properly orchestrated, the buyer can have many advantages for transacting over a marketplace, and the transaction cost decreases (not necessarily the price).</p><h3>2. Lowering the CAC for sellers</h3><p>Sellers have upsides here, too. Since a buyer is already there, the cost to reach out to them is way lower than fetching them from far.</p><p>This is more true in some markets than others. The goods or services need to be a commodity (many sellers), with a fragmented buy-side.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sVcl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sVcl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!sVcl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!sVcl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!sVcl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sVcl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:438650,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sVcl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!sVcl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!sVcl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!sVcl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1650ce9-da98-4311-91bc-18232fec04e8_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p>The effect on the CAC is very relevant. Sellers can compete on value and price, attracting more buyers and lowering CAC.</p><p>The battle for leaner prices also has huge advantages for sellers: The reduction in price is more than counterbalanced by the growth in the Total Addressable Market (TAM) created.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OLyK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OLyK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!OLyK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!OLyK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!OLyK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OLyK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204998,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OLyK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!OLyK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!OLyK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!OLyK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe85e8cc2-ce9c-4a9f-b3ae-e1a81ba00e79_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9rvw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9rvw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!9rvw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!9rvw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!9rvw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9rvw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:233743,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9rvw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 424w, https://substackcdn.com/image/fetch/$s_!9rvw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 848w, https://substackcdn.com/image/fetch/$s_!9rvw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 1272w, https://substackcdn.com/image/fetch/$s_!9rvw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe78971c-85f3-4cc2-9fbb-21011a4f0a7b_3780x3780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We will discuss more on the benefits of marketplaces in future posts, as there are many facets to understand that make this model specifically adapt to solve the data distribution problem.</p><div><hr></div><h2>About the Project</h2><p>That was it for this week!</p><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[Web Scraping Legal Context]]></title><description><![CDATA[As the need for data grows, so does the need for clearer regulation]]></description><link>https://blog.databoutique.com/p/web-scraping-legal-context</link><guid isPermaLink="false">https://blog.databoutique.com/p/web-scraping-legal-context</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Wed, 20 Sep 2023 04:54:36 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dAmo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace. If you&#8217;re looking for web data, there is a high chance someone is already collecting it. <a href="https://www.databoutique.com/">Data Boutique</a> makes it easy and safe to buy data from them.</em></p><div><hr></div><h1><strong>What is the legal context for Web Scraping?</strong></h1><p>Me and my long-time business partner <strong><a href="https://open.substack.com/users/42673781-pierluigi-vinciguerra?utm_source=mentions">Pierluigi Vinciguerra</a></strong> have been in the web scraped-data business for quite a while now. We have been collecting and selling data to some of the largest companies in Asia, Europe, the UK, and the USA, from real estate, hedge funds, consumer electronics, fashion, and luxury.</p><p>We have always paid great attention to its legality, and if you are a web data user or are involved in web scraping, you should care too.</p><p>While web scraping is as old as the Internet itself, it has never found a proper place in the legislation of many countries (here is a <a href="https://www.quinnemanuel.com/the-firm/publications/the-legal-landscape-of-web-scraping/">recent post by Quinn Emanuel legal firm for the US market</a>).</p><p>Although not given a dedicated space by lawmakers, it still is so wi(l)dely used, that some high-level conclusions can be drawn.</p><h4><strong>Self-guidance organisms</strong></h4><p>Professional investors, such as hedge funds, were among the first to adopt some form of self-guidance for web scraping, as they operate in an SEC-regulated market, with a <a href="https://mccarthylg.com/sec-puts-web-scraping-and-the-investment-firms-who-use-it-in-the-crosshairs/">specific accent on Material-Nonpublic Information</a> (MNPI) and the risks of insider trading.</p><p>These self-guidance organizations are the <a href="https://www.investmentdata.org/">Investment Data Standard Organization</a> IDSO and the <a href="https://fisd.net/alternative-data-council/">Alternative Data Council of FISD</a>. They have done and still do an excellent job in shedding light on a very complex and sometimes very technical subject. Please comment if you know of other organizations, especially in other industries.</p><h4><strong>The rise of genAI and the need for web data</strong></h4><p>The recent surge in genAI web data demand mixed with the &#8220;move-fast, break things&#8221; philosophy of Silicon Valley, drove a rise of lawsuits hitting companies of any size.</p><p>Just as a reminder, here is a non-exhaustive list of the three most recent lawsuits related to web scraping that come to mind:</p><ul><li><p>September 2023, a <a href="https://www.reuters.com/legal/litigation/openai-microsoft-hit-with-new-us-consumer-privacy-class-action-2023-09-06/">class action was filed against Microsoft</a> for how ChatGPT used privacy data</p></li><li><p>July 2023, <a href="https://www.reuters.com/legal/litigation/google-hit-with-class-action-lawsuit-over-ai-data-scraping-2023-07-11/">Google hit with class-action lawsuit over AI data scraping</a></p></li><li><p>June 2023, <a href="https://www.reuters.com/legal/lawsuit-says-openai-violated-us-authors-copyrights-train-ai-chatbot-2023-06-29/">Lawsuit says OpenAI violated US authors' copyrights to train AI chatbot</a></p></li></ul><p>Now, as the mission of Data Boutique is to promote the safe and fair use of web-scraped data, we will use this post to try to define the high-level context. We hope this will help someone out there.</p><div><hr></div><h2><strong>Web Scraping Legal Context</strong></h2><p><em>General advice: We are not lawyers; don&#8217;t take our word as pure gold, and speak to your legal counselor if you have questions on web scraping. Data regulation changes materially by geography and over time. We do our best to provide helpful information, but this is our own interpretation and might be inaccurate.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dAmo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dAmo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dAmo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dAmo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dAmo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dAmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;when is webscraping legal&quot;,&quot;title&quot;:&quot;when is webscraping legal&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="when is webscraping legal" title="when is webscraping legal" srcset="https://substackcdn.com/image/fetch/$s_!dAmo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dAmo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dAmo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dAmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0112da74-a30b-4bf1-b1c6-c38d30f6f57c_2912x2912.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Web scraping legal context</figcaption></figure></div><p></p><p>When it comes to web scraping, there are two elements we need to consider:</p><ol><li><p>The nature of data, which impacts our right to buy, use, transform, and sell it</p></li><li><p>The visibility of data, which strictly regards the act of collecting, if it is in breach of property laws or existing contracts or agreements</p></li></ol><h3><strong>1. The nature of the data collected</strong></h3><p>The main attention of regulators and lawmakers is toward protecting citizen&#8217;s privacy and copyright holders&#8217; interests. There are three different types of protected data:</p><ol><li><p>Personal Data</p></li><li><p>Intellectual Property</p></li><li><p>&#8220;Sui generis&#8221; Database</p></li></ol><h4><strong>Personal Data</strong></h4><p><a href="https://www.investopedia.com/terms/p/personally-identifiable-information-pii.asp">Personally Identifiable Information (PII)</a> is information that, when used alone or with other data, can help identify an individual. Privacy laws protect the rights of citizens, like the European General Data Protection Regulation (<a href="https://en.wikipedia.org/wiki/General_Data_Protection_Regulation">GDPR</a>) or the California Consumer Privacy Act (<a href="https://en.wikipedia.org/wiki/California_Consumer_Privacy_Act">CCPA</a>).</p><p>If you collect an individual's name from LinkedIn, for example, without her/his consent, you are very likely violating something. So, as a rule of thumb, don&#8217;t.</p><h4><strong>Intellectual Property</strong></h4><p><a href="https://en.wikipedia.org/wiki/Intellectual_property">Intellectual Property</a> (IP) is content that is the result of human creativity, from images to audio, to written content, computer code, and so on. Humanity has used variations of this concept <a href="https://en.wikipedia.org/wiki/Venetian_Patent_Statute">since 1474</a>, and it is very well protected from a legal standpoint.</p><p>To be considered Intellectual Property, the content needs to be original and have a form of uniqueness. A long text form, like a <a href="https://www.nytimes.com/interactive/2022/10/20/travel/things-to-do-milan.html">NYT article</a>, fits the definition. A generic text like &#8220;Blue T-shirt&#8221; does not.</p><p>In general, in order to use, transform, or sell intellectual property, you need to have IP owner&#8217;s consent.</p><p>There are exceptions, like the <a href="https://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/">fair use of copyrighted material</a> and the debate to extend it <a href="https://techpolicy.press/copyright-fair-use-regulatory-approaches-in-ai-content-generation/">also for training AI</a> (allowing copyrighted material to be used by AI even without the consent of the owner). More<a href="https://www.reedsmith.com/en/perspectives/ai-in-entertainment-and-media/2023/06/text-and-data-mining-around-the-globe#:~:text=Text%20and%20data%20mining%20in,and%20another%20for%20everyone%20else."> exceptions</a> to the use of copyrighted material exist around the world, and the situation is always evolving.</p><h4><strong>&#8220;Sui Generis&#8221; Database</strong></h4><p>Sui Generis Property Right (or <a href="https://en.wikipedia.org/wiki/Database_right">Database Right</a>) is a form of protection of databases, regardless of their originality, that required great effort to be built. If there was a substantial investment, either financial or of work, in creating, validating, and presenting the database itself, the law protects it. It exists in the EU, the UK and other countries, but - as of today, not in the USA.</p><p><strong>There are exceptions</strong> to the collection and use of copyrighted information related to the <strong>purpose</strong> of the data collection (academia and scientific research can) or <strong>the extent</strong> of the scraping (scraping a little is tolerated, scraping the whole website, not).</p><h4><strong>What is not covered?</strong></h4><p>If web scraping for personally identifiable information (PII), IP-protected data o EU databases <strong>exposes to risks</strong>, non PII and non IP protected, does less so. This is where it gets interesting (and yes, debated).</p><p>When is content not covered by IP? When is a database not sui generis covered?</p><p><strong>Are prices and timetables of an airline covered by some form of IP</strong>? <a href="https://www.martinimanna.com/blog/ryanair-and-the-protection-of-its-flight-information-database-a-recent-ruling-of-the-ecj">This 2015 European Court of Justice ruling</a> stated that Ryanair flight prices and timetables did not constitute IP content (prices and timetables are factual measures, which do not have the intrinsic element of originality) and the database did not have the right to be covered under EU Database legislation. Web scrapers can still be sued for breaking website terms and conditions.</p><p><strong>Are ChatGPT texts, or Midjourney images, protected by copyright?</strong> A federal judge ruled that <a href="https://www.spiceworks.com/tech/artificial-intelligence/news/us-copyright-law-ai-generated-content/">artificial intelligence or AI-generated content can&#8217;t be copyrighted</a>. If this was also confirmed by other rulings and geographies, entire AI-generated websites could not have their content protected.</p><p>These two points mark an interesting line for web scraping. As long as it doesn&#8217;t violate the law, these data seem legit to collect.</p><h3><strong>2. The visibility of data</strong></h3><p>Data on the internet can be accessed in many ways, some more legal than others.</p><p>It is easy to understand that violating computer systems to access information is different than accessing the information available to anyone with no particular technical skill.</p><h4><strong>Non-public information</strong></h4><p>Any internal website, company portal, or anything that prevents an outsider from seeing the information, is non-public. Only employees of a company see it, if it&#8217;s information that is reserved to a single individual (like a bank account) and needs user&#8217;s credentials to access, is non-public.</p><p>If the web scraped data has the consent of the user, and the conditions for this consent are met, then scraping is possible. But if some of these elements fail, web scraping is very risky. A basic rule of thumb: If you need to hack it to see it, don&#8217;t scrape it.</p><h4><strong>Public information</strong></h4><p>When anyone with an internet connection can see the information, that is public.</p><p>Some websites, even if public, are required to agree to terms and conditions to see content. If you don&#8217;t agree, you get no access.</p><p>It is highly possible that these terms prohibit web scraping (see LinkedIn example here).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iRCw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iRCw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iRCw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iRCw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iRCw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iRCw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg" width="492" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:492,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Linkedin user agreement on web scraping&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Linkedin user agreement on web scraping" title="Linkedin user agreement on web scraping" srcset="https://substackcdn.com/image/fetch/$s_!iRCw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 424w, https://substackcdn.com/image/fetch/$s_!iRCw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 848w, https://substackcdn.com/image/fetch/$s_!iRCw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!iRCw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b0b4cb1-376f-432f-9932-ed1bf91bda49_492x346.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">LinkedIn User Agreement 8.2.2 paragraph, as of September 18th, 2023</figcaption></figure></div><p></p><p>According to self-guidance organizations mentioned above, only Terms and Conditions that require explicit acceptance by users (click-wrap ToS) are valid, while generic footnotes (browse-wrap ToS) do not provide sufficient ground for enforcement.</p><h4><strong>Robots.txt</strong></h4><p>This is probably the most relevant element for web scraping activities: A little file called robots.txt under <a href="http://www.websitename.com/robots.txt">www.websitename.com/robots.txt</a> is the <strong>explicit indication from websites to robots</strong>.</p><p>This little file, optionally present in many websites, tells robots where they are allowed to crawl. This is not the law. It&#8217;s a <a href="https://en.wikipedia.org/wiki/Robots.txt">&#8220;de facto standard&#8221;, born in 1994</a>. It&#8217;s a way the website owner is telling bots where to go.</p><p>For example, the Farfetch.com file <a href="https://www.farfetch.com/robots.txt">https://www.farfetch.com/robots.txt</a> contains this line:</p><pre><code><code>Allow: /it/*?*lang=it-IT</code></code></pre><p>Which tells bots <strong>they are allowed to crawl</strong> the <a href="http://www.farfetch.com/it/">www.farfetch.com/it/</a> part of the website.</p><div><hr></div><h2><strong>How can a data buyer know?</strong></h2><p>Assessing if web scraped data violates IP, database laws, robots.txt files can be quite exhausting. This is why we promote safe access to data.</p><p>Web scraping is only a tool, just like a hammer. You can use the hammer to build useful things like a bridge or a house, or use it fraudulently to smash a car&#8217;s window and steal from it. Understanding the difference defines how we interact with society.</p><p>We do our best to make take web scraping - at the foundation of so many technologies or websites we know or use - come out of its &#8220;Wild West moment&#8221;.</p><p>Web data is everywhere. We need to start treating it and trading it properly.</p><div><hr></div><h2>About the Project</h2><p>That was it for this week!</p><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[5 Reasons You Should Stop Web Scraping]]></title><description><![CDATA[An approach to web data as a service]]></description><link>https://blog.databoutique.com/p/5-reasons-you-should-stop-web-scraping</link><guid isPermaLink="false">https://blog.databoutique.com/p/5-reasons-you-should-stop-web-scraping</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Fri, 15 Sep 2023 04:21:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Hj8y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace. If you&#8217;re looking for web data, there is a high chance someone is already collecting it. <a href="https://www.databoutique.com/">Data Boutique</a> makes it easier to buy web data from them.</em></p><div><hr></div><h1>Web data is an enabler, not the core.</h1><p>Recent decades have seen the creation of countless applications that use web data, from generative AI to market intelligence, from search to dynamic pricing. <a href="https://www.wired.com/insights/2014/07/data-new-oil-digital-economy/">Data is the new oil</a> (they said 10+ years ago).</p><blockquote><p><strong>But let&#8217;s not be confused: Web data is the </strong><em><strong>enabler</strong></em><strong> for these apps, it is not </strong><em><strong>the</strong></em><strong> apps.</strong></p></blockquote><p>Data is an enabler, just like <em>servers</em> are. Yet nobody owns servers anymore: We access them &#8220;as a service&#8221;.</p><p>Data is an enabler, just like <em>business software</em> is, from CRM to analytics tools. 20 years ago, companies would write their own. Today, accessing them &#8220;as a service&#8221; is the <em>best practice</em>.</p><p><em>Web data</em> is an enabler. Once it made sense to collect it internally. Now, web scraping has become a resource-eating, risk-exposing activity. The number of challenges a scraping developer has to go through, from chasing websites&#8217; code dynamics, to circumventing anti-bot software, significantly raised the bar.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.linkedin.com/posts/pierluigivinciguerra_is-web-scraping-becoming-harder-activity-7106883355916316673-DIe_?utm_source=share&amp;utm_medium=member_desktop" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mBwn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 424w, https://substackcdn.com/image/fetch/$s_!mBwn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 848w, https://substackcdn.com/image/fetch/$s_!mBwn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 1272w, https://substackcdn.com/image/fetch/$s_!mBwn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mBwn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png" width="530" height="276" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:276,&quot;width&quot;:530,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59435,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.linkedin.com/posts/pierluigivinciguerra_is-web-scraping-becoming-harder-activity-7106883355916316673-DIe_?utm_source=share&amp;utm_medium=member_desktop&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mBwn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 424w, https://substackcdn.com/image/fetch/$s_!mBwn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 848w, https://substackcdn.com/image/fetch/$s_!mBwn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 1272w, https://substackcdn.com/image/fetch/$s_!mBwn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd95ed21b-4baf-4209-a8f7-bbbfae2424cf_530x276.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h1>5 reasons to ditch web scraping and start buying it as a service (DaaS)</h1><p>Should a company switch from in-house scraping for public web data to Data as a Service (DaaS)? Let&#8217;s see when it makes sense.</p><p>I use the term &#8220;public web data&#8221; as it implies that it is accessible and collectible by anyone with the proper tools, and it is not something exclusive to the company that will be using it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hj8y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hj8y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Hj8y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Hj8y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Hj8y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hj8y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:105241,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hj8y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Hj8y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Hj8y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Hj8y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e32bc3-4a33-4cd9-a537-e897f4fb0b05_1920x1080.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Cost</h3><p>Reason number one for switching to DaaS: Cold, hard cash. </p><p>Think of all the cost components of in-house web scraping - people involved, proxy providers, hardware, tools, or the annual fee from data farms when outsourced.</p><p>These costs are: </p><ol><li><p>Very inelastic - once engaged, they&#8217;re not very sensitive to scaling down the frequency of the scraping</p></li><li><p>Hard to tell if you&#8217;re paying too much (there is no price benchmark for this)</p></li></ol><p>By switching to DaaS, these costs get SUPER elastic, and you know you are paying fair market prices - the town things combined often end in a 100X cost reduction for low-frequency data refreshes.</p><h3>2. Scalability</h3><p>How fast can an in-house team add 10 websites? And how fast can they stop those websites and start doing 10 others instead? </p><p>This is maybe the largest advantage of DaaS. </p><ul><li><p>You can scale in frequency (going from monthly data collection to weekly or daily) can be handled on the fly and changed (scaled-up or scaled-down) anytime, as many times as necessary. There are no commitments. </p></li><li><p>You can scale in scope: If we want to add (or remove) 10 websites similar to the one we are collecting already, this also can be handled on the fly, with no technical knowledge or execution delay. </p></li></ul><p>As a consequence, you can build very fast, very cheap PoC (Proof of Concept) at very little cost, and then scale them up to production right when needed.</p><h3>3. Talent allocation</h3><p>Spending time figuring out what line of code broke in your scraper might not be the best use of your talent&#8217;s time. Why? Because there are hundreds of other talents out there doing the same thing, for <em>that exact same website</em>. </p><p>It would be much more efficient to spend it on how to use this data, how to transform it, structure it, change domains, convert, and lookup values. All activities that were <em>already</em> there, even before, but they were understaffed.</p><p>If you free your talent&#8217;s time from activities that can be found on the market, and have it on stuff critical and differentiating, you get all hands on what&#8217;s really differentiating. </p><h3>4. De-risking</h3><p>Keeping web-scraping internal? Be prepared for due diligence, maintain policies, adhere to regulations, and disclose procedures and logs. For those selling data to hedge funds and other regulated entities, this may not come as a surprise, but copyright and privacy lawsuits are getting frequent also for AI and SaaS, regarding their web scraping activities.</p><p>Again: Procedures, log, disclosure. Is this the best way to allocate your talent&#8217;s time?</p><p>Or, when you think this is not core to you, buy from a platform instead, and have this sorted. </p><h3>5. Strategic Positioning</h3><p>All this finally leads to the core question: Is your company a SaaS/AI company, or is it a web scraping company?</p><p>The alignment of all activities on your strategic positioning helps focus on what&#8217;s differentiating about your company. Just like it is not core to hold and maintain servers, and develop and maintain CRM software, maybe it&#8217;s not web scraping when this can be found on the market.</p><div><hr></div><h1>Is web scraping core instead? Then monetize it.</h1><p>This industry has a lot of talented players. Many of which, having worked for years in web scraping, are just not yet ready to jump to DaaS. I understand that. </p><p>Are you more confident in your own data? Is your cost base competitive with what you find on Data Boutique? Is your data acquisition pipeline so strong you&#8217;d trust it more than what you&#8217;d find on a marketplace?</p><p>Fantastic. You should then monetize this capability, and sell on Data Boutique. It&#8217;s a win-win.</p><div><hr></div><h2>About the Project</h2><p>That was it for this week!</p><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item><item><title><![CDATA[Network Effects: When Value Comes from Others Joining in]]></title><description><![CDATA[... and Data Boutique Partnership Program launch]]></description><link>https://blog.databoutique.com/p/network-effects-when-value-comes</link><guid isPermaLink="false">https://blog.databoutique.com/p/network-effects-when-value-comes</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Fri, 01 Sep 2023 04:24:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KJ4R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace. If you&#8217;re looking for web data, there is a high chance someone is already collecting it. <a href="https://www.databoutique.com/">Data Boutique</a> makes it easier to buy web data from them.</em></p><div><hr></div><h1>Network Effects: When Value Comes from Others Joining in</h1><p><em><strong>TL;DR: Network effects in Data Boutique favor buyers who refer buyers, and sellers who refer sellers, because of mutual self-interest. Plus: The launch of our Partnership Program: Long-term value sharing with our network.</strong></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/referral-network&quot;,&quot;text&quot;:&quot;Join The Program&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/referral-network"><span>Join The Program</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KJ4R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KJ4R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KJ4R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KJ4R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KJ4R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KJ4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:135058,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KJ4R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KJ4R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KJ4R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KJ4R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F39cd41bf-8f6b-4cae-97f5-01347217aa5c_1920x1080.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Buying and selling web scraped data is a fragmented business, and this fragmentation causes hidden costs.</p><p>Whether you buy it or hire someone to collect it, it&#8217;s a one-on-one relationship between buyer and seller: Finding each other, selecting, testing, validating, and negotiating the transaction takes time. </p><p>You must go through the same process when you need it again. Every. Single. Time.</p><p>This is not the case for <em>data marketplaces</em> (heavily managed marketplaces<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, to be precise): Once on the platform, every time you buy data, most of these costs disappear. From the second purchase onwards, <em>transaction costs are lower</em>: Data streams with the same format, structure, delivery method, quality controls, and payment methods.</p><p>But there is more: Every single person who joins the platform adds value to all other actors.</p><h3>More buyers is good for buyers (cost-wise)</h3><p>What is obvious is that more buyers is a good thing for sellers.</p><p>What is less obvious is that <em>more buyers is also a good thing for</em> <em>buyers</em>. Let&#8217;s see why.</p><ol><li><p>Data selling (with pre-scraped datasets) is a fixed-cost activity. Once the collection costs are covered, every new sale simply adds margins.</p></li><li><p>The more sales units, the more margin a seller has to lower prices and attract more buyers.</p></li></ol><p>In practice: If a data product is purchased by just one client, the seller is forced to keep higher prices to cover costs. But this is less than ideal, as the convenience vs. in-house web scraping would be smaller.</p><p>When more clients buy the same product, scale economies will give the seller room to lower unit prices and attract more sales. Competition, in fact, is not seller vs. seller, but seller vs. in-house scraping.</p><blockquote><p>Although buyers may be skeptical about suggesting to others where to source data, in the end it&#8217;s a self-interest mechanism: Data Buyers have a direct incentive to refer users to the platform <em>as it lowers their future costs</em>.</p></blockquote><p>Let&#8217;s say I know Peter. Peter is using web data for image recognition in fashion e-commerce. I am using web data for market intel in fashion. I <em>want</em> Peter to join the marketplace, as in the long run costs will go down <em>for me</em>.</p><h3>More sellers is good for sellers (density)</h3><p>Let&#8217;s look at the other side: Sellers. </p><p>Web scraping is highly competitive. Try posting a web-scraping job on <a href="https://www.upwork.com/">Upwork </a>and see how many apply, you&#8217;d be surprised. So, it&#8217;s natural for sellers to be reluctant when other sellers join a marketplace, as they are afraid of competition. </p><p>In a <em>talent</em> marketplace (Upwork, or <a href="https://www.fiverr.com/">Fiverr</a>), this is true: The more people doing web scraping join in, the higher the competition. It&#8217;s a losing price-squeezing game.</p><p>But the economics of <em>data</em> marketplaces are different: If sellers offer different data products - <em>and we as market makers need to make sure this is incentivized</em> - more sellers means more products. It&#8217;s a win-win, here&#8217;s why:</p><blockquote><p>Data products, unlike others, are not mutually exclusive: They are mutually reinforcing. There is a higher chance buyers searching for dataset A, will purchase dataset A <em>and</em> B, if B is available and adjacent to A, as it completes the information set.</p></blockquote><p>An example: Let&#8217;s say I&#8217;m a data seller offering H&amp;M data, and I know Tina. Tina is scraping ZARA website. <strong>It is my interest to have Tina with me on the platform offering ZARA, because this raises my sales of H&amp;M if we&#8217;re together</strong>. The sum is greater than the parts. </p><p>Now, if Matthew joins, offering MANGO or BERSHKA, we&#8217;re getting closer to building a cluster of data for fast-fashion brands, that we alone would have trouble building, given the cost of collecting all of the brands individually.</p><p>Some may argue, I could have scraped ZARA directly, instead of calling in Tina, but the reality is I would harvest a bigger gain if I scraped UNIQLO, or FOREVER21, and still called Tina in for ZARA.</p><p>Sellers have the incentive to put their effort in adding websites when others don&#8217;t cover them, and calling in others, when they do. This allows them to grow the graph faster, and reach together a larger audience, than they&#8217;d be able to do alone.</p><div><hr></div><h3>Plus: Launching The Partnership Program</h3><p>We have seen network effects. Let&#8217;s speak about the Partnership Program:</p><p>Soon after its launch, we saw Data Boutique&#8217;s impact reach <em>beyond</em> the core web-scraping community: Decoupling data collection from data usage <em>created new applications</em>. Without the burden of data sourcing, we enabled teams to access data and create PoCs and MVPs quickly, safely, and with low investments. </p><p>From Business Intelligence to AI, from price monitoring to market research, applications are countless.</p><blockquote><p>This is why <strong>we decided to open our Partnership Program</strong>: To strenghten our outreach to those applications, BI teams within corporates, AI startups and geeks. We offer a multi-year value-sharing program, rewarding the growth of the network.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/referral-network&quot;,&quot;text&quot;:&quot;Join The Program&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.databoutique.com/referral-network"><span>Join The Program</span></a></p><p>Enter the program and refer data-hungry companies. It&#8217;s a variable-sum game. </p><p>Joint, all win.</p><div><hr></div><h2>About the Project</h2><p>That was it for this week!</p><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Covering search, bargaining, enforcement, and distribution. See <a href="https://www.danhock.co/p/service-marketplaces">Dan Hockenmayer</a>&#8217;s post on different marketplace types.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Problems When Disclosing How Data Is Collected]]></title><description><![CDATA[Recent news highlights how in-house data collection might not be taken well by users.]]></description><link>https://blog.databoutique.com/p/problems-when-disclosing-how-data</link><guid isPermaLink="false">https://blog.databoutique.com/p/problems-when-disclosing-how-data</guid><dc:creator><![CDATA[Andrea Squatrito]]></dc:creator><pubDate>Fri, 25 Aug 2023 04:09:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F01280021-8d9b-4d20-8d73-cf0d6d150ed2_1064x1064.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3><em>About Data Boutique</em></h3><p><em><a href="https://www.databoutique.com/">Data Boutique</a> is a web-scraped data marketplace. </em></p><p><em>If you&#8217;re looking for web data, there is a high chance someone is already collecting it. <a href="https://www.databoutique.com/">Data Boutique</a> makes it easier to buy web data from them.</em></p><p><em>Join our Platform to learn and interact about this project:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.databoutique.com/&quot;,&quot;text&quot;:&quot;Join Data Boutique&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.databoutique.com/"><span>Join Data Boutique</span></a></p><div><hr></div><h1>Zoom, GPTbot, and Google &#8220;fair use&#8221; of copyrighted data</h1><p>Three pieces of news inherent to what we do at Data Boutique came out recently:</p><ol><li><p>Videoconferencing app <strong><a href="https://stackdiary.com/zoom-terms-now-allow-training-ai-on-user-content-with-no-opt-out/">Zoom </a></strong><a href="https://stackdiary.com/zoom-terms-now-allow-training-ai-on-user-content-with-no-opt-out/">Terms of Service updates</a> caused users&#8217; resentment, as they feared their personal conversations might be (or might have been) used to train AI. Zoom later said users need to opt-in to the service, but nonetheless, it caused a heated debate;</p></li><li><p><strong><a href="https://platform.openai.com/docs/gptbot">OpenAI</a></strong><a href="https://platform.openai.com/docs/gptbot"> released the specifics of the GPTbot</a>, a web crawler aimed at collecting data for LLMs. OpenAI states it allows websites to opt-out of being scraped, but the question remains on what the incentive would be for websites to stay in, since they would be feeding for free a paid service to Microsoft with no reward;</p></li><li><p>In an Australian court, <strong><a href="https://www.tomshardware.com/news/google-ai-scraping-as-fair-use">Google</a></strong><a href="https://www.tomshardware.com/news/google-ai-scraping-as-fair-use"> is claiming a &#8220;fair use&#8221; of copyrighted data collection</a>. Basically, Google admits unlawful crawling of copyrighted data, claiming it shouldn&#8217;t be considered unlawful. Like the OpenAI case, Google claims that websites can opt out of the service, triggering the same questions that arise for the GPTbot case (given that they can separate it from the Google bot aimed at the search engine).</p></li></ol><p>While this is not the place to discuss the details of this news, I am leaving a link to a podcast I recommend by <a href="https://twitter.com/nlw">Nathaniel Whittemore</a> covering Zoom and the OpenAI case (looking forward to a new one on Google).</p><iframe class="spotify-wrap podcast" data-attrs="{&quot;image&quot;:&quot;https://i.scdn.co/image/ab6765630000ba8a55521ec5c821aade2ef99cbb&quot;,&quot;title&quot;:&quot;GPTBot AI Data Controversy and the Remaining Challenges of LLMs&quot;,&quot;subtitle&quot;:&quot;Nathaniel Whittemore&quot;,&quot;description&quot;:&quot;Episode&quot;,&quot;url&quot;:&quot;https://open.spotify.com/episode/4YTsKkBSa9I8uDdNvw9xa1&quot;,&quot;belowTheFold&quot;:true,&quot;noScroll&quot;:false}" src="https://open.spotify.com/embed/episode/4YTsKkBSa9I8uDdNvw9xa1" frameborder="0" gesture="media" allowfullscreen="true" allow="encrypted-media" loading="lazy" data-component-name="Spotify2ToDOM"></iframe><h2>Why this is relevant to us</h2><p>What is interesting for our case is the rising disclosure of data collection methods for AI, driven by users&#8217; debate, regulators, and court filings. In summary:</p><ol><li><p>Public opinion and regulators are <strong>paying more attention</strong> to what data is used in AI and are increasingly skeptical about providing their own for free;</p></li><li><p>Companies are forced to <strong>move out of stealth</strong> and disclose their data collection operations;</p></li><li><p>Current solutions seem to be aligned on adding an <strong>opt-in/out feature</strong>, which suggests a transition of sourcing models from a &#8220;complete but unethical&#8221; to a &#8220;clean but partial&#8221; one, as we can expect many to opt-out.</p></li></ol><div><hr></div><h2>The Data Market Question</h2><p>While the Zoom case refers to collecting data on how users interact with their application, which cannot be collected in other manners (yet does not imply users are willing to adhere), the <strong>OpenAI and Google cases refer to collecting generic purpose data</strong>, and they represent just a tiny portion of the entire AI ecosystem using web-scraped data to feed their business models which is in the same situation.</p><blockquote><p><em>In the eyes of the consumer, they stand not accused of the quality and innovation of their algorithms and models but of the way they source data.</em></p></blockquote><p>There is an obvious <strong>decoupling</strong> of the business of <em>collecting </em>data from the business of <em>using </em>the data, especially since progress in technology made the tools for using this data available to everyone.</p><p>In our vision, this separation between data collection and usage, and the increasing attention from public opinion and regulators on the first one, <strong>sets the stage for the general adoption of data markets</strong>: Markets on data that exists regardless of its use cases, data whose collection methods need to adhere to accepted rules, and requires quality assurance to prevent selection bias on the applications that will be developed on top of it.</p><p>A data market is where data collection is brought to light, adopts common rules, performs professionally and reliably, and is offered at fair market prices.</p><div><hr></div><h2>The Value of Accountability</h2><p>When AI projects keep data collection in-house, they are onboarding a commoditized portion of the value chain with an embedded responsibility on a critical aspect in today&#8217;s society: <strong>Where and how did you get the data</strong>? Users, public opinion, regulators, and investors will sooner or later pose that question. </p><p>Data marketplaces address this. By being accountable for this, they relieve data and AI projects from the burden of a time- and resource-consuming activity loaded with reputational and operational risks. </p><p>Although not all data needs can be addressed by marketplaces, <strong>why on earth would you embark on something that burns cash, takes time, brings on risks, and raises eyebrows when you could have found it in a store? Because data is critical to your project? </strong>That is the exact reason why <em>you should not</em> do it in-house.</p><div><hr></div><h2>Join the Project</h2><p>That was it for this week!</p><p>Data Boutique is a community for sustainable, ethical, high-quality web data exchanges. You can <a href="https://www.databoutique.com/buy-data-list">browse the current catalog</a> and add your request if a website is not listed. Saving datasets to your interest list will allow sellers to correctly size the demand for datasets and onboard the platform.</p><p>More on this project can be found on <a href="https://discord.gg/yXGasRHYrb">our Discord channels</a>.</p><div><hr></div><p>Thanks for reading and sharing this.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.databoutique.com/subscribe?&quot;,&quot;text&quot;:&quot;Iscriviti adesso&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.databoutique.com/subscribe?"><span>Iscriviti adesso</span></a></p><p> </p>]]></content:encoded></item></channel></rss>