Portfolio

Data automation

Oslo Scraping: data pipeline for an AI chat

A Python pipeline that consolidates properties and developments from Tokko and Google Sheets, runs on its own via GitHub Actions and publishes a clean JSON that feeds the real estate agency's AI chat.

2024 Client project
PythonGitHub ActionsScrapingAIAutomation

Tech stack

The technologies used and why.

Lenguaje Python
Automatización GitHub Actions
Fuente de datos Google Sheets API
CRM / API Tokko Broker
Salida JSON
Publicación GitHub Gist

The problem

This project is the data piece of the Oslo Propiedades ecosystem. The website’s AI chat needs a reliable knowledge base: a single, clean and up-to-date catalog of every property and development. The problem is that this information does not live in one place. Part of it is in Tokko (the real estate CRM) and part of it is in a Google Sheets file the team maintains by hand. Not everything in one source is in the other.

If the chat consumes outdated or incomplete data, it answers badly. And a chat that answers badly about properties and prices is worse than having no chat at all.

The pipeline

I built a Python pipeline that handles the consolidation:

The result is a single catalog that reflects reality, with no duplicates and no gaps.

Automation

The pipeline is not run by hand. A GitHub Actions workflow runs it on a schedule and, on each run, PATCHes a GitHub Gist with the fresh data. This has a concrete advantage: the JSON URL stays stable, so the consumer (the chat) never has to change where it points, yet always receives up-to-date data.

The AI chat

That JSON is the knowledge base for the chat integrated into the Oslo website. With the real, up-to-date catalog, the chat can answer questions about available properties, filter by area or price and provide contact information, without making things up and without falling out of sync with the real inventory.

This work complements Oslo’s main website (the WordPress to Next.js migration with Tokko): while that one solves experience and SEO, the pipeline solves the quality and freshness of the data that feeds the AI.