Process automation

Agency reporting in 12 minutes: a Notion to Looker case

A 14-person Rotterdam agency was burning two senior days a month assembling client decks. We cut the ritual to twelve minutes with one Python file.

Jacob Molkenboer· Founder · A Brand New Company· 19 Apr 2024· 9 min

Brass stopwatch beside twine-bound cream report cards with green tab and red wax seal on ivory desk.

The first Monday of June, a marketing agency in Rotterdam. The ops lead opens eleven browser tabs before the espresso machine finishes hissing. Notion for the project tracker. Google Analytics for three clients whose logins she remembers. Meta Ads Manager for six clients. LinkedIn Campaign Manager for the B2B half. Search Console. An Airtable that someone built in 2023 to track retainers. By the time she has copy-pasted client one into the monthly template it is 10:14am, and she has answered four Slack pings about "the deck."

This is what monthly reporting looked like at a 14-person agency we worked with this spring. Two senior staff spent two full working days each month assembling 23 client decks. The numbers were already six days old by the time the decks reached the clients. At blended rates, the agency was burning roughly 6,000 EUR every month moving numbers from one screen into another.

We cut it to twelve minutes. The twelve minutes is review time. The actual pipeline runs at 06:00 on the first of the month, on a 4 EUR Hetzner box, and is finished before anyone is awake. Here is how it was built, and the parts of it we deliberately left out.

The audit came first

Nobody had ever written down where the agency's reporting data actually lived. So we spent the first afternoon making a list. Twelve sources in total. Six were ad platforms the agency was already exporting to a shared Google Drive folder as monthly CSVs. Two were SaaS dashboards with usable APIs (Hotjar, Brevo). One was a Plausible instance the agency self-hosted. The remaining three were Notion, Airtable, and a Google Sheet a freelancer maintained for organic social.

The bottleneck was not the dashboards. The bottleneck was reconciliation. The same client appeared as "De Witte Bakker" in Notion, "Witte Bakker BV" in Meta Ads, "dewittebakker.nl" in Google Analytics, and "witte_bakker" in the Airtable invoicing tag. A junior was paid to make those names match every month, by eye. Half of the senior time was actually spent fixing the junior's matches.

Takeaway

If your reporting feels slow, the dashboards are almost never the problem. The reconciliation between systems is the problem. Fix that and the dashboards build themselves.

One source of truth for client identity

We picked Notion. Not because Notion is the best database (it is not), but because the agency already lived in Notion and we needed the ops team to keep editing one place, not two. Every client row in the Notion "Clients" database got a stable internal ID, a canonical name, and three columns of platform-specific aliases: GA property ID, Meta ad account ID, LinkedIn account ID.

That single change, before we wrote any code, removed the manual reconciliation step entirely. From then on, every downstream join keyed off the internal ID instead of a fuzzy name match. The ops lead spent half a day filling in the alias columns. She has not touched them since.

The shape of the pipeline

We settled on the simplest stack that could survive a year of neglect. Notion as the client dimension. Google BigQuery as the warehouse. Looker Studio for the client-facing report. One Python worker in between, scheduled by systemd.

Notion exposes a well-documented REST API with cursor pagination and a three-request-per-second rate limit. BigQuery accepts batched inserts cheaply. Looker Studio reads from BigQuery natively and renders dashboards that clients can be given view-only access to without another login.

We did consider Zapier, n8n, and Make for this. All three would have worked. We ruled them out because the agency had been burned twice already by a Zapier flow nobody understood, and because the per-task pricing on Zapier scales badly when you are syncing 23 clients across six platforms. We ruled out Fivetran for the same pricing reason. For the ad platforms we used vendor-native BigQuery exports where they existed (Google Ads, Search Console, GA4) and a single self-hosted Airbyte instance for the rest. That meant most of the data was already landing in BigQuery without our worker touching it. The worker's only jobs were syncing the Notion clients table and running the transformation that produced the unified report.

The Python worker, in one file

Here is the meat of the Notion to BigQuery sync. It is roughly the file we shipped, with the project name changed.

import os
import time
import requests
from google.cloud import bigquery

NOTION_TOKEN = os.environ["NOTION_TOKEN"]
NOTION_DB = os.environ["NOTION_CLIENTS_DB"]
BQ_TABLE = "agency_warehouse.dim_clients"

HEADERS = {
    "Authorization": f"Bearer {NOTION_TOKEN}",
    "Notion-Version": "2022-06-28",
    "Content-Type": "application/json",
}

def fetch_all_clients():
    url = f"https://api.notion.com/v1/databases/{NOTION_DB}/query"
    cursor = None
    while True:
        body = {"page_size": 100}
        if cursor:
            body["start_cursor"] = cursor
        r = requests.post(url, headers=HEADERS, json=body, timeout=30)
        r.raise_for_status()
        data = r.json()
        for row in data["results"]:
            yield flatten(row)
        if not data.get("has_more"):
            return
        cursor = data["next_cursor"]
        time.sleep(0.35)  # stay under 3 req/s

def flatten(page):
    p = page["properties"]
    return {
        "client_id": page["id"],
        "canonical_name": text(p["Name"]),
        "ga_property_id": text(p.get("GA Property")),
        "meta_account_id": text(p.get("Meta Account")),
        "li_account_id": text(p.get("LinkedIn Account")),
        "retainer_eur": number(p.get("Retainer EUR")),
        "active": p["Active"]["checkbox"],
        "last_synced": page["last_edited_time"],
    }

def text(prop):
    if not prop:
        return None
    parts = prop.get("title") or prop.get("rich_text") or []
    return "".join(part["plain_text"] for part in parts) or None

def number(prop):
    return prop.get("number") if prop else None

def write_to_bq(rows):
    client = bigquery.Client()
    job = client.load_table_from_json(
        list(rows),
        BQ_TABLE,
        job_config=bigquery.LoadJobConfig(
            write_disposition="WRITE_TRUNCATE",
            schema_update_options=["ALLOW_FIELD_ADDITION"],
        ),
    )
    job.result()

if __name__ == "__main__":
    write_to_bq(fetch_all_clients())

That is the whole thing. Sixty lines of Python, no framework, no orchestrator. It runs in about nine seconds against the agency's 23-row clients database. A systemd timer fires it every morning at 05:55. If it fails, the timer retries five minutes later and posts to a Slack channel the ops lead actually reads. We deliberately did not wire it up to Airflow or Prefect. At this scale they are theatre.

The transformation step

Once the clients dimension lives in BigQuery, the rest of the work is SQL. We wrote a single scheduled query that joins the ad-platform fact tables to the clients dimension on the internal IDs, computes monthly aggregates per channel, and writes the result to a monthly_client_summary table. Looker Studio reads from that table.

We used BigQuery's scheduled queries rather than dbt for this. dbt is the right answer when you have fifty models that depend on each other. We had four, and none of them shared logic. The scheduled query is forty lines of SQL and has not been touched in five months. When the agency hires a data analyst (which we suggested they do), the migration to dbt will take an afternoon.

The Looker Studio template

One report template, parameterized by client ID. The ops lead duplicates the template per client, sets the parameter, and shares the URL with the client. Clients see their own data only, because the BigQuery view filters on the parameter. There is no per-client deck anymore. The "deck" is a live link the client can open whenever they want.

The agency was nervous about giving clients live access at first. Three months in, two clients have asked for the link to be locked back down to a monthly PDF snapshot. The other twenty-one prefer the live version and check it themselves between calls. One of them started flagging campaign drift on a Wednesday afternoon instead of waiting for the monthly call. That alone has paid for the project.

The gotchas we hit

Three things bit us. They are worth flagging because they will bite anyone doing the same thing.

First, Notion silently renames properties. If someone in the agency renames the "Meta Account" column to "Meta Ad Account" through the Notion UI, the Notion API stops returning that field under the old name. Our flattener used to crash. We now fingerprint the database schema at the start of each run, log any drift, and Slack the ops lead if a known column has vanished. The flattener tolerates missing columns and emits NULL.

Second, Notion's rich-text fields can be split across multiple text runs even when they look like one string in the UI. If a user copy-pastes a value with different formatting, you get a list of three text objects, not one. The text() helper above joins them. We learned that the hard way after a client's GA property ID came back as "G-ABC" instead of "G-ABC123XYZ" because of a stray font change halfway through the string.

Third, WRITE_TRUNCATE is fine for a 23-row dimension. It would be wrong for a fact table. If you copy this pattern for anything larger than a few thousand rows, use a MERGE statement keyed off the Notion page ID instead. Truncate-and-reload at scale will burn through your slot quota and break any downstream materialised view that depends on stable row identity.

Warning

Never trust a Notion column name to be stable. Always fingerprint the schema at the top of the run and fail loudly if a column you depend on has been renamed or deleted.

What it actually cost

The full bill, monthly: 4 EUR for the Hetzner box, around 2 EUR in BigQuery storage and query costs at the agency's volume, zero for Looker Studio, zero for the Notion API. The Airbyte instance runs on the same Hetzner box. Total infrastructure: about 7 EUR a month, against roughly 6,000 EUR a month in senior time we replaced.

Build time was four working days, end to end, including the source audit and one round of revisions on the Looker template. The agency had a working version on day three. The fourth day was spent removing the bits we did not need.

What we deliberately did not build

No internal admin UI. The ops lead edits Notion, the way she always has. No anomaly detection. Looker Studio has conditional formatting and the agency wanted to see drift with their own eyes for the first six months. No Slack bot that posts daily numbers. The team has enough Slack noise. No machine-learning model that predicts churn. The agency does not have the data volume to train one, and a logistic regression on retainer length and campaign spend would tell them the same thing for free.

Every one of those was floated in the kickoff. Each one would have added a week of build, a week of training, and a maintenance tail. The point of the project was to give two senior people back two days a month. Anything that did not contribute to that got cut.

The smaller idea inside the bigger idea

The interesting part of this project was not the pipeline. The interesting part was the audit. Once we had drawn the data flow on one A3 page, the agency could see that 80% of the two-day monthly ritual was reconciliation. Fix the reconciliation, and the rest is plumbing.

You can do the audit without us. Open a notebook, write down every system your reporting touches, and draw an arrow for every manual copy-paste between them. The arrows are the work. The boxes are not.

When we built this pipeline for the Rotterdam agency, the wrinkle we did not see coming was Notion's silent property renaming, and we solved it by fingerprinting the schema on every run and failing loudly when it drifted. If your team has a similar reporting ritual eating senior hours every month, this is the kind of process automation we ship.

One thing you can do today: list every system your last monthly report pulled from, and circle the ones whose client identifier does not match across systems. That circle is the project.

Key takeaway

If monthly reporting feels slow, the dashboards are not the problem. Reconciliation between systems is the problem. Fix that first and the rest is plumbing.

FAQ

Why use Notion as the source of truth and not a real database?

Because the agency already lived in Notion. Moving the dimension to Postgres would have added a second place to edit. The 23-row dataset never strained the Notion API.

Why BigQuery and Looker Studio instead of Metabase or Power BI?

BigQuery's free tier covered the agency's volume, Looker Studio is free, and clients can view dashboards without another login. Metabase needs hosting; Power BI needs licences.

Would this work for a 100-client agency?

The shape works. The truncate-and-reload pattern does not. Swap WRITE_TRUNCATE for a MERGE keyed on the Notion page ID, and move the transformation into dbt once you have more than ten models.

How long did it take to build?

Four working days end to end, including the source audit and one round of Looker revisions. The agency had a working version on day three.

process automationautomationcase studyintegrationsworkflowoperations

Building something?

Start a project