Data automation
Oslo Scraping: data pipeline for an AI chat
A Python pipeline that consolidates properties and developments from Tokko and Google Sheets, runs on its own via GitHub Actions and publishes a clean JSON that feeds the real estate agency's AI chat.
Tech stack
The technologies used and why.
The problem
This project is the data piece of the Oslo Propiedades ecosystem. The website’s AI chat needs a reliable knowledge base: a single, clean and up-to-date catalog of every property and development. The problem is that this information does not live in one place. Part of it is in Tokko (the real estate CRM) and part of it is in a Google Sheets file the team maintains by hand. Not everything in one source is in the other.
If the chat consumes outdated or incomplete data, it answers badly. And a chat that answers badly about properties and prices is worse than having no chat at all.
The pipeline
I built a Python pipeline that handles the consolidation:
- It reads properties and developments from Tokko and from the Google Sheets file.
- It takes the
PROYECTOcolumn from the Sheet as the authoritative source for grouping (for exampleOSLO52278), instead of guessing by name or by URL. - It consolidates everything into a grouped, clean and consistent JSON.
- It also generates a separate endpoint with a summary of developments, built from an independent sheet.
The result is a single catalog that reflects reality, with no duplicates and no gaps.
Automation
The pipeline is not run by hand. A GitHub Actions workflow runs it on a schedule and, on each run, PATCHes a GitHub Gist with the fresh data. This has a concrete advantage: the JSON URL stays stable, so the consumer (the chat) never has to change where it points, yet always receives up-to-date data.
The AI chat
That JSON is the knowledge base for the chat integrated into the Oslo website. With the real, up-to-date catalog, the chat can answer questions about available properties, filter by area or price and provide contact information, without making things up and without falling out of sync with the real inventory.
This work complements Oslo’s main website (the WordPress to Next.js migration with Tokko): while that one solves experience and SEO, the pipeline solves the quality and freshness of the data that feeds the AI.