Tools for eCommerce businesses have become so powerful that everyone uses them. And it’s great. Except when you need to make an analysis and the data comes from different sources.
Launching and managing an eCommerce store is now easier than ever. The amazing amount of new tools operating as SaaS (software-as-a-service) has changed the way digital businesses operate. The SaaS revolution of the last 10 years has hugely impacted almost every industry, allowing digital incumbent players to compete with large and resourceful companies. Retail, with the transition to eCommerce, is one of the most impacted, as there were huge barriers to build a large physical retail business (rentals, complex operations…). The current platforms and tools allow a small team with a great product to reach a lot of customers and manage the complexity to sell worldwide.
In an online store, the main tool is the “eCommerce Platform”: Shopify, WooCommerce, Prestashop, Magento… The most relevant information is stored there (customers, orders…). But there are a lot of other sources around them. For instance, apps or extensions that manage returns, CRM platforms, customer service tools, Google Analytics… All those data sources mainly pull information from the “eCommerce Platform”, but don’t send information back to it.
Additionally, the most relevant suppliers offer great platforms. The most clear example is advertising networks such as Facebook Ads and Google Ads. With a few clicks, you can set up a targeted campaign to reach your potential audience thanks to the great technology the tech giants have built. They hold lots of data, and they offer their customers relevant data, such as which ads are working better, to which countries… But they can’t offer other kinds of information, such as which percentage of the total sales is generated by their ads. Because they don’t have the total sales data, they “only” have their data. At a different scale, also several shipping and logistic companies have great platforms too, where their customers can manage shipments easily and get a lot of information, again hugely disconnected from other tools.
All those platforms operate independently in a technical dimension. For sure there are integrations and API connections, but much less than what is needed. On the one hand, it’s OK for a lot of processes because tools and platforms focus mainly on one process: acquiring users from Facebook, shipping an order from the warehouse to the customer… On the other hand, a lot of use cases require information that none of the current tools has completely.
Large companies solve this problem by having a core technology platform managed by them, normally an ERP solution (Enterprise Resource Planning). For them, every relevant business process should go through it, involving tons of custom-built software on a very flexible platform such as SAP. But small and medium-sized companies can’t invest as much as requiered to have a good set-up of an ERP. A deployment of such a solution costs the magnitude of hundreds of thousands, it’s not flexible to constant changes of processes and suppliers, and it’s better not to have it than having it poorly implemented. For standard eCommerce businesses, the “eCommerce Platform” such as Shopify is the closest tool they have to an ERP.
The solution for the business intelligence problem of it is having a place where all the relavant information is merged. The technical name for it is Data Warehouse, also called sometimes Enterprise Data Warehouse. In short, it is a database where the information from the relevant data sources is sent and from where a data visualization tool pulls the information to be displayed in charts, dashboards… In the last few years, types of databases have been developed to operate as Data Warehouses such as Amazon Redshift, Snowflake or Google BigQuery.
The first big challenge is to get the information from the data sources, most frequently via API, and copy this information on the Data Warehouse. Once every information is in the same data source, it can be combined smoothly. For example, taking the order identifier from the information taken from Shopify and searching this identifier in the list of shipments from the carrier – UPS for instance. Once this is done, a visualization tool such as Tableau, Power BI, Superset or Looker can display information from UPS and Shopify combined: a line chart that displays the average amount of days it takes an order from being purchased on Shopify to be delivered by UPS.
And there are more complex analyses very hard to be made without connecting the data sources. For example, computing the margin of a given SKU in a specific country for a given month. Without a Data Warehouse, at last the following data should be downloaded and copied to an Excel spreadsheet to make the calculation:
- Get the amount of net sales and net number of units sold from Shopify
- Get the manufacturing cost of this SKU, probably from an Excel file from someone in the team
- Get advertising cost from Facebook Ads and Google Ads allocated to this SKU – this is a tough one
- Get the shipment cost from UPS – take into account that this SKU can be combined with others in a same shipment so the shipment cost should be split
- Get the payment fee taken by Shopify Payments and Paypal
And one month later, the information is outdated and the work needs to be done again…