Databases
MySQL to Postgres: 14 gotchas pgloader quietly skipped
On Friday the pgloader output said zero errors. On Monday the invoicing module returned nothing. A cheatsheet from a 240-table MySQL to Postgres migration.

Friday, 16:40, Eindhoven. The pgloader summary scrolled past in green: 240 tables, 31 million rows, eleven minutes, zero fatal errors. The new RDS Postgres instance was warm. Smoke tests passed. We closed the laptop.
Monday, 08:12. A Teams ping from the controller at the Brabant industrial supplier we had just cut over. The invoicing module shows zero open invoices. She can see them in pgAdmin. The PHP just refuses to find them.
What follows is the cheatsheet we wish someone had handed us. Fourteen places where pgloader's output read clean but the system, end to end, was not. Grouped by where the silence lives: first the gotchas closest to pgloader's own job, then the ones at the application layer where MySQL was simply more forgiving.
A note on shape. pgloader is excellent at the mechanical work: schema translation, data copy, sequence creation. It is not, and does not claim to be, an application-layer translator. Most of what follows lives in the gap between "the data is in Postgres" and "the PHP application reads from Postgres correctly". With that out of the way:
Schema and type translations that read green
The first cluster sits closest to pgloader's actual job. The translations succeed and the data lands. The translated artefact is then wrong in a way that smoke tests do not surface, because the application is reading exactly the columns it expects to read. These are the ones we think pgloader should have flagged loudest and flagged least.
Zero dates nullified without a peep
MySQL with permissive
sql_modehappily stored'0000-00-00 00:00:00'in NOT NULL DATETIME columns. There were 17,400 rows of these, mostly from a 2017 import script that never setdue_date. pgloader replaced them with NULL on copy and quietly dropped the NOT NULL on the way through. The "open invoices" query didORDER BY due_date ASC, and Postgres sorts NULLs last by default. The invoices weren't missing. They were below the fold of an internal list view nobody scrolled. Fix:ORDER BY due_date ASC NULLS FIRST, plus an audit of every column where pgloader nulled a zero date.TINYINT(1) coerced to BOOLEAN, but only mostly
pgloader's
cast type tinyint when (= 1 precision) to booleanrule catches mosttinyint(1)columns. It missed two where the original DDL had been hand-edited to plaintinyintwithout precision. Existing PHP wroteWHERE active = 1and kept working against the smallint columns. New code we wrote on top of the migrated schema usedWHERE active IS TRUE, which errored against those two. We spent half a day chasing what looked like an intermittent driver bug.AUTO_INCREMENT sequence not advanced past MAX(id)
The classic. pgloader seeds the sequence from MySQL's
auto_incrementcounter, which is reliable until somebody back-filled rows with explicit high ids and the counter drifted behind. The first INSERT after cutover collides on the primary key.SELECT setval( pg_get_serial_sequence('invoice', 'id'), (SELECT MAX(id) FROM invoice) );Run that for every table with a serial column before you let the app write a single row.
ON UPDATE CURRENT_TIMESTAMP silently dropped
MySQL's
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMPbecomesupdated_at timestamp DEFAULT now()in the Postgres DDL pgloader emits. The ON UPDATE clause has no column-level equivalent in Postgres. It just disappears. Six days later the sales lead asked why the "last touched" column on the customer list had stopped advancing. The answer was: nothing was advancing it. We added a genericBEFORE UPDATEtrigger that bumpsupdated_at, applied across 41 tables.utf8mb3 byte sequences Postgres clients won't accept on read
The oldest tables were declared
utf8on a MySQL server from before utf8mb4 was the default. A handful of rows had smart quotes pasted out of Word in 2014 that were borderline even under MySQL's lax rules. pgloader copied them in. Postgres held them happily on disk. One client library (the audit log SDK, in Go) rejected them on SELECT because its UTF8 decoder was stricter than PDO's. Postgres itself was fine. The SDK couldn't talk to it.UNSIGNED INT widened to BIGINT, then truncated by the application
Range-preserving and correct: pgloader maps
INT UNSIGNEDtoBIGINT. Our PHP did(int) $row['quantity']on values that, once they could legally exceed 2^31, did. An ETL feed into a Sage X3 instance overflowed silently. Nothing in pgloader's output suggested checking the consumers.ENUMs become CHECK constraints with autogenerated names
pgloader translates MySQL ENUM into either a Postgres custom TYPE or a CHECK constraint, depending on the flag set. The constraint name it picks does not match the naming convention your migration tool expects. Six months later we wanted to add a value to
statusand our Phinx migration couldn't find the constraint to drop. Rename them after migration if you plan to evolve the schema later. The half hour you spend renaming saves a frustrating afternoon when the schema needs to move.
Application-layer semantics MySQL quietly forgave
Past this point pgloader has no business flagging anything. The data is in the right shape. What is wrong is that the application was written in the MySQL dialect of SQL: looser typing, opinionated string handling, and shortcuts Postgres does not provide. These break at runtime, in the PHP, often weeks after cutover when a code path that hadn't been exercised yet finally is.
Reserved word identifiers that worked fine in MySQL
Tables named
user,order,group. pgloader quotes them in the schema, so the relations exist. The PHP DAO did not quote them, because MySQL doesn't make you.SELECT * FROM user WHERE id = ?errors in Postgres withrelation "user" does not exist. A grep over the entire codebase found 312 of these queries. A few were in template strings concatenated at runtime, which the grep didn't catch.Implicit type coercion in WHERE clauses
MySQL evaluates
WHERE invoice_number = 12345against a VARCHAR column by coercing the integer to a string. Postgres throwsoperator does not exist: character varying = integer. This is purely an application-layer issue. pgloader has nothing to flag. Search the codebase for any WHERE that compares a literal of one type to a column of another, and cast the literal.GROUP BY without every non-aggregated column
MySQL with default
sql_modepermitsSELECT a, b, COUNT(*) FROM t GROUP BY a. Postgres requiresbin the GROUP BY or wrapped in an aggregate. The internal sales dashboard had eight queries that did this. Four of them only failed under certain WHERE filters because of how MySQL had been picking an arbitrary row forb. The numbers had been wrong for years. Postgres surfaced it.CONCAT returns NULL when any argument is NULL
MySQL's
CONCAT('Hello, ', NULL)returns an empty-coerced string under some modes. Postgres returns NULL. The invoice mail merge usedCONCAT(first_name, ' ', last_name)for the salutation. 380 historical contacts had nofirst_name. For a week, those clients received "Dear ," at the top of their payment reminder. Switch to the lower-case Postgresconcat()function, which ignores NULLs, or wrap each argument inCOALESCE.ON DUPLICATE KEY UPDATE has no direct equivalent
The PHP used it in fourteen places, most for the daily product-stock sync. The Postgres equivalent is
INSERT ... ON CONFLICT (column) DO UPDATE SET ..., documented at postgresql.org/docs. You must name the conflict target, which means picking a unique constraint or index by hand. pgloader cannot guess that for you. Port each occurrence one at a time and write a test for each.Backticks vs double quotes, and the case-sensitivity trap
MySQL
SELECT `order`.`status` FROM `order`becomes PostgresSELECT "order"."status" FROM "order". Mechanically a sed pass. The trap is that quoted identifiers in Postgres are case-sensitive:"Order"and"order"are different relations. If your codebase mixesOrderandorderacross different files, pick one form, normalise the schema, then run the sed.FULLTEXT indexes silently dropped
pgloader does not translate
FULLTEXTindexes orMATCH () AGAINST ()queries, and arguably shouldn't try. The product search on the sales portal returned zero results for two days before someone in the warehouse mentioned it in a side comment. The Postgres fix is a generatedtsvectorcolumn with a GIN index, with the queries rewritten to useplainto_tsqueryorwebsearch_to_tsquery. Postgres full-text search is good once you set it up. It is not free on cutover day.
pgloader doing its job correctly is necessary, not sufficient. Twelve of these fourteen would still have happened if pgloader had reported zero warnings and the schema diff was clean. The application layer is the migration.
The pre-pgloader SQL audit
Before pgloader runs at all, do a SQL grep. Search the application code for: tinyint, boolean, CONCAT, IFNULL, ON DUPLICATE, INSERT IGNORE, FULLTEXT, MATCH (, backtick identifiers, every reserved word, and every WHERE comparing two different types. That grep, not the pgloader output, is the migration plan. Two days well spent up front saves a week of Monday-morning Teams pings later.
When we ran the cutover for the Brabant supplier (a 240-table custom PHP ERP that had been growing for sixteen years), the thing that hurt most was not the schema. It was the assumption that a green pgloader summary meant the system was correct. Our legacy migration work now fronts every cutover with a two-week SQL audit before pgloader sees a single DDL file.
Five-minute audit you can run today on your own application repo: grep -RIn --include='*.php' -E 'FULLTEXT|MATCH \(|ON DUPLICATE KEY|INSERT IGNORE' .. The hit count is the size of the work pgloader will not do for you.
Key takeaway
pgloader translates the schema. The migration is what you do to the application layer afterwards.
FAQ
Should we use pgloader or write the migration by hand?
Use pgloader for the schema and data copy. Write the application-layer audit yourself. pgloader earns its keep when paired with a SQL grep of your codebase done before cutover, not as a replacement for one.
How long does a 200-table MySQL to Postgres migration take?
The pgloader run itself is minutes to a few hours. The application-layer work, including query rewrites, driver swaps, and triggers to replace ON UPDATE columns, is usually two to four weeks for a system of this size.
Does pgloader handle stored procedures and triggers?
No. MySQL stored procedures, triggers, and most views need to be re-implemented in PL/pgSQL by hand. pgloader does not warn you about them. Inventory them before you start so you know the scope.
What is the most common cause of queries failing after a MySQL to Postgres cutover?
Implicit type coercion. MySQL silently coerces integers to strings and back; Postgres does not. Most queries that break on day one are WHERE clauses comparing a literal of one type to a column of another.