Databases

MySQL to Postgres: 14 gotchas pgloader quietly skipped

On Friday the pgloader output said zero errors. On Monday the invoicing module returned nothing. A cheatsheet from a 240-table MySQL to Postgres migration.

Jacob Molkenboer· Founder · A Brand New Company· 3 Sept 2024· 9 min

Open oak index-card drawer with brass divider, one cream card tabbed in chartreuse, ledger stack and wax-sealed bundle beside it.

Friday, 16:40, Eindhoven. The pgloader summary scrolled past in green: 240 tables, 31 million rows, eleven minutes, zero fatal errors. The new RDS Postgres instance was warm. Smoke tests passed. We closed the laptop.

Monday, 08:12. A Teams ping from the controller at the Brabant industrial supplier we had just cut over. The invoicing module shows zero open invoices. She can see them in pgAdmin. The PHP just refuses to find them.

What follows is the cheatsheet we wish someone had handed us. Fourteen places where pgloader's output read clean but the system, end to end, was not. Grouped by where the silence lives: first the gotchas closest to pgloader's own job, then the ones at the application layer where MySQL was simply more forgiving.

A note on shape. pgloader is excellent at the mechanical work: schema translation, data copy, sequence creation. It is not, and does not claim to be, an application-layer translator. Most of what follows lives in the gap between "the data is in Postgres" and "the PHP application reads from Postgres correctly". With that out of the way:

Schema and type translations that read green

The first cluster sits closest to pgloader's actual job. The translations succeed and the data lands. The translated artefact is then wrong in a way that smoke tests do not surface, because the application is reading exactly the columns it expects to read. These are the ones we think pgloader should have flagged loudest and flagged least.

Zero dates nullified without a peep
MySQL with permissive sql_mode happily stored '0000-00-00 00:00:00' in NOT NULL DATETIME columns. There were 17,400 rows of these, mostly from a 2017 import script that never set due_date. pgloader replaced them with NULL on copy and quietly dropped the NOT NULL on the way through. The "open invoices" query did ORDER BY due_date ASC, and Postgres sorts NULLs last by default. The invoices weren't missing. They were below the fold of an internal list view nobody scrolled. Fix: ORDER BY due_date ASC NULLS FIRST, plus an audit of every column where pgloader nulled a zero date.
TINYINT(1) coerced to BOOLEAN, but only mostly
pgloader's cast type tinyint when (= 1 precision) to boolean rule catches most tinyint(1) columns. It missed two where the original DDL had been hand-edited to plain tinyint without precision. Existing PHP wrote WHERE active = 1 and kept working against the smallint columns. New code we wrote on top of the migrated schema used WHERE active IS TRUE, which errored against those two. We spent half a day chasing what looked like an intermittent driver bug.
AUTO_INCREMENT sequence not advanced past MAX(id)
The classic. pgloader seeds the sequence from MySQL's auto_increment counter, which is reliable until somebody back-filled rows with explicit high ids and the counter drifted behind. The first INSERT after cutover collides on the primary key.
```
SELECT setval(
  pg_get_serial_sequence('invoice', 'id'),
  (SELECT MAX(id) FROM invoice)
);
```
Run that for every table with a serial column before you let the app write a single row.
ON UPDATE CURRENT_TIMESTAMP silently dropped
MySQL's updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP becomes updated_at timestamp DEFAULT now() in the Postgres DDL pgloader emits. The ON UPDATE clause has no column-level equivalent in Postgres. It just disappears. Six days later the sales lead asked why the "last touched" column on the customer list had stopped advancing. The answer was: nothing was advancing it. We added a generic BEFORE UPDATE trigger that bumps updated_at, applied across 41 tables.
utf8mb3 byte sequences Postgres clients won't accept on read
The oldest tables were declared utf8 on a MySQL server from before utf8mb4 was the default. A handful of rows had smart quotes pasted out of Word in 2014 that were borderline even under MySQL's lax rules. pgloader copied them in. Postgres held them happily on disk. One client library (the audit log SDK, in Go) rejected them on SELECT because its UTF8 decoder was stricter than PDO's. Postgres itself was fine. The SDK couldn't talk to it.
UNSIGNED INT widened to BIGINT, then truncated by the application
Range-preserving and correct: pgloader maps INT UNSIGNED to BIGINT. Our PHP did (int) $row['quantity'] on values that, once they could legally exceed 2^31, did. An ETL feed into a Sage X3 instance overflowed silently. Nothing in pgloader's output suggested checking the consumers.
ENUMs become CHECK constraints with autogenerated names
pgloader translates MySQL ENUM into either a Postgres custom TYPE or a CHECK constraint, depending on the flag set. The constraint name it picks does not match the naming convention your migration tool expects. Six months later we wanted to add a value to status and our Phinx migration couldn't find the constraint to drop. Rename them after migration if you plan to evolve the schema later. The half hour you spend renaming saves a frustrating afternoon when the schema needs to move.

Application-layer semantics MySQL quietly forgave

Past this point pgloader has no business flagging anything. The data is in the right shape. What is wrong is that the application was written in the MySQL dialect of SQL: looser typing, opinionated string handling, and shortcuts Postgres does not provide. These break at runtime, in the PHP, often weeks after cutover when a code path that hadn't been exercised yet finally is.

Reserved word identifiers that worked fine in MySQL
Tables named user, order, group. pgloader quotes them in the schema, so the relations exist. The PHP DAO did not quote them, because MySQL doesn't make you. SELECT * FROM user WHERE id = ? errors in Postgres with relation "user" does not exist. A grep over the entire codebase found 312 of these queries. A few were in template strings concatenated at runtime, which the grep didn't catch.
Implicit type coercion in WHERE clauses
MySQL evaluates WHERE invoice_number = 12345 against a VARCHAR column by coercing the integer to a string. Postgres throws operator does not exist: character varying = integer. This is purely an application-layer issue. pgloader has nothing to flag. Search the codebase for any WHERE that compares a literal of one type to a column of another, and cast the literal.
GROUP BY without every non-aggregated column
MySQL with default sql_mode permits SELECT a, b, COUNT(*) FROM t GROUP BY a. Postgres requires b in the GROUP BY or wrapped in an aggregate. The internal sales dashboard had eight queries that did this. Four of them only failed under certain WHERE filters because of how MySQL had been picking an arbitrary row for b. The numbers had been wrong for years. Postgres surfaced it.
CONCAT returns NULL when any argument is NULL
MySQL's CONCAT('Hello, ', NULL) returns an empty-coerced string under some modes. Postgres returns NULL. The invoice mail merge used CONCAT(first_name, ' ', last_name) for the salutation. 380 historical contacts had no first_name. For a week, those clients received "Dear ," at the top of their payment reminder. Switch to the lower-case Postgres concat() function, which ignores NULLs, or wrap each argument in COALESCE.
ON DUPLICATE KEY UPDATE has no direct equivalent
The PHP used it in fourteen places, most for the daily product-stock sync. The Postgres equivalent is INSERT ... ON CONFLICT (column) DO UPDATE SET ..., documented at postgresql.org/docs. You must name the conflict target, which means picking a unique constraint or index by hand. pgloader cannot guess that for you. Port each occurrence one at a time and write a test for each.
Backticks vs double quotes, and the case-sensitivity trap
MySQL SELECT `order`.`status` FROM `order` becomes Postgres SELECT "order"."status" FROM "order". Mechanically a sed pass. The trap is that quoted identifiers in Postgres are case-sensitive: "Order" and "order" are different relations. If your codebase mixes Order and order across different files, pick one form, normalise the schema, then run the sed.
FULLTEXT indexes silently dropped
pgloader does not translate FULLTEXT indexes or MATCH () AGAINST () queries, and arguably shouldn't try. The product search on the sales portal returned zero results for two days before someone in the warehouse mentioned it in a side comment. The Postgres fix is a generated tsvector column with a GIN index, with the queries rewritten to use plainto_tsquery or websearch_to_tsquery. Postgres full-text search is good once you set it up. It is not free on cutover day.

Warning

pgloader doing its job correctly is necessary, not sufficient. Twelve of these fourteen would still have happened if pgloader had reported zero warnings and the schema diff was clean. The application layer is the migration.

The pre-pgloader SQL audit

Before pgloader runs at all, do a SQL grep. Search the application code for: tinyint, boolean, CONCAT, IFNULL, ON DUPLICATE, INSERT IGNORE, FULLTEXT, MATCH (, backtick identifiers, every reserved word, and every WHERE comparing two different types. That grep, not the pgloader output, is the migration plan. Two days well spent up front saves a week of Monday-morning Teams pings later.

When we ran the cutover for the Brabant supplier (a 240-table custom PHP ERP that had been growing for sixteen years), the thing that hurt most was not the schema. It was the assumption that a green pgloader summary meant the system was correct. Our legacy migration work now fronts every cutover with a two-week SQL audit before pgloader sees a single DDL file.

Five-minute audit you can run today on your own application repo: grep -RIn --include='*.php' -E 'FULLTEXT|MATCH \(|ON DUPLICATE KEY|INSERT IGNORE' .. The hit count is the size of the work pgloader will not do for you.

Key takeaway

pgloader translates the schema. The migration is what you do to the application layer afterwards.

FAQ

Should we use pgloader or write the migration by hand?

Use pgloader for the schema and data copy. Write the application-layer audit yourself. pgloader earns its keep when paired with a SQL grep of your codebase done before cutover, not as a replacement for one.

How long does a 200-table MySQL to Postgres migration take?

The pgloader run itself is minutes to a few hours. The application-layer work, including query rewrites, driver swaps, and triggers to replace ON UPDATE columns, is usually two to four weeks for a system of this size.

Does pgloader handle stored procedures and triggers?

No. MySQL stored procedures, triggers, and most views need to be re-implemented in PL/pgSQL by hand. pgloader does not warn you about them. Inventory them before you start so you know the scope.

What is the most common cause of queries failing after a MySQL to Postgres cutover?

Implicit type coercion. MySQL silently coerces integers to strings and back; Postgres does not. Most queries that break on day one are WHERE clauses comparing a literal of one type to a column of another.

mysqlmigrationphplegacy sitesarchitectureoperations

Building something?

Start a project

MySQL to Postgres: 14 gotchas pgloader quietly skipped

Schema and type translations that read green

Zero dates nullified without a peep

TINYINT(1) coerced to BOOLEAN, but only mostly

AUTO_INCREMENT sequence not advanced past MAX(id)

ON UPDATE CURRENT_TIMESTAMP silently dropped

utf8mb3 byte sequences Postgres clients won't accept on read

UNSIGNED INT widened to BIGINT, then truncated by the application

ENUMs become CHECK constraints with autogenerated names

Application-layer semantics MySQL quietly forgave

Reserved word identifiers that worked fine in MySQL

Implicit type coercion in WHERE clauses

GROUP BY without every non-aggregated column

CONCAT returns NULL when any argument is NULL

ON DUPLICATE KEY UPDATE has no direct equivalent

Backticks vs double quotes, and the case-sensitivity trap

FULLTEXT indexes silently dropped

The pre-pgloader SQL audit

FAQ

Building something?