Food Prices in Latvian Marketplaces

This is a data project for tracking the food prices in different Latvian retailer e-stores and calculating a “weekly food basket” price for each of them.

I will be using the recommended “weekly food basket” from https://www.lm.gov.lv/lv/majsaimniecibu-relativo-izdevumu-budzeta-mri-budzets-izstrades-metodologija-rezultati as the basis for the calculations. This provides a list of 48 food products with their recommended weekly consumption amounts in kg.

The markets

The markets available here are as the data sources for the project make up about 62% of the total grocery store market (NACE 47.11) in Latvia by Net Turnover. The data for the Total Net Turnover of the market was taken from the Central Statistical Bureau of Latvia (last available year at the time of publishing was 2022) and the specific data for each of the larger marketplaces was taken from the respective yearly financial reports (again, for year 2022). The exception here is Lats, which is a franchise, there the number was taken from a press release which gave the Net Turnover of all of the partner stores. In the case of Barbora, which technically is separate legal entity from Maxima (they split it off to solely operate their online store and product delivery), I chose to use the actual main legal entities (“MAXIMA Latvija” SIA) Net Turnover for simplicity and fair comparison.

Adjustments made for the project.

I have made some small adjustments in the way I use this list for consistency, such as only looking at frozen berries and fish, since their prices might be more impacted by seasonality.

Since for some products, such as eggs and some vegetables, the prices are given per unit, I have recalculated the costs for them to per kg using the average typical weights for these products. This allows consistent comparison across marketplaces and simplifies the calculations.

Current approach for calculations

Currently the cheapest available product in each category picked from each marketplace to be used for calculations. This approach is simplified, since it allows discounts for different products to have a larger impact, and I’m thinking about introducing a more complex formula down the road.

The current formula is quite simple: the cheapest available product price per kg found in a marketplace * the recommended weekly product mass = products weekly food basket price.

This isn’t a perfect representation of the actual food basket prices, since not all products can be purchased at the exact amount given in the recommendations, however this approach does allow for a “fair” comparison across different marketplaces and consistency.

If a product type is missing on a particular day (for example it is out of stock) the last available price from that particular marketplace is used. Again, this isn’t a perfect approach, since items might go out of stock because of a large discount, so this might “artificially” lower the price for a while, until a new one is available, so I’m planning on introducing a more complex a formula down the road.

The dashboard doesn’t scale well on phones, I suggest viewing in landscape to better see the graphs or just using a monitor.

Technical information about the project

Data is collected and formatted using Python scripts with Selenium Webdriver, which are run daily at midnight, so fresh data is available at the start of each day. Collected data, along with the “recommended weekly food basket” table, is stored in a MySQL database. Then separate Python script is used to calculate the “Food basket price”, while making sure missing product prices are adjusted for using older data. This final script updates the main MySQL table used for project. The data then is visualised using Metabase.

Selenium is by no means the most efficient way to achieve this result, but it does allow us to work with different marketplace websites that tend to use JavaScript and don’t have a straightforward way to extract the data. I use multithreading in the scripts to speed up the process and the data is only collected once a day, so this good compromise for now.

The project will be published to Github once I’m done optimising things and have implemented some more advanced formulas to take into account some of the current drawbacks I have noted earlier.

Things to come

Once enough historical data is accumulated I will post an update, where I will look at how food prices have changed over time in different product types and marketplaces. Currently I just don’t have enough data to make any meaningful insights or show trends. I’m also thinking about making a Twitter (X) bot, to post these trends and prices daily.