Databases
Reading .sql dumps without importing them: a CLI audit
You inherited a 4.2 GB .sql.gz from the previous agency with one Sunday-night line: get the site back up by Monday. Read the dump cold from the terminal first.

The dump in your inbox at 11 pm
A client emailed a 4.2 GB .sql.gz from the previous agency on a Sunday night with one line: "this is everything from the old server, please get the site back up by Monday." We did not import it. Not Sunday night, not Monday morning. We read it first.
Importing an inherited dump before you understand its shape is how teams end up with a staging database that secretly carries two million rows of session garbage, two character sets fighting for the same column, or an API key sitting in plaintext inside a wp_options row. Twenty minutes with GNU coreutils will tell you what you actually inherited.
The first three commands
Before anything else, run these against the file as it sits on disk:
file dump.sql.gz
gunzip -c dump.sql.gz | head -200
gunzip -c dump.sql.gz | wc -lfile confirms whether the archive really is a gzip-compressed mysqldump, a tarball, or a Windows-encoded UTF-16 file that will explode in your terminal. head -200 shows the mysqldump version header, the server version that produced the dump, the default character set, and usually the first two or three CREATE TABLE statements. wc -l gives a line count, which is the cheapest sanity check available. A 4 GB dump with 12 lines was written with --extended-insert, one INSERT per table. Eight million lines means per-row inserts, and the import will take an order of magnitude longer.
Mapping every table without an import
This one is the workhorse:
gunzip -c dump.sql.gz | grep -E "^CREATE TABLE" | awk '{print $3}' | sortThirty seconds later you have the complete table inventory, sorted, with the prefix visible. If you see wp_, wp2_, and old_wp_ in the same dump, congratulations: you inherited three half-deleted WordPress installs glued together. If you see jos_, the site is Joomla 1.5 from before 2012. If you see both jos_ and j25_, someone tried to upgrade Joomla and gave up halfway through.
For a quick row-volume estimate per table, count the multi-row INSERT chunks:
gunzip -c dump.sql.gz \
| grep "^INSERT INTO \`wp_posts\`" \
| grep -o "),(" \
| wc -lThe number is approximate (it undercounts by one per INSERT statement and misses single-row tables), but for an audit it tells thousands of rows apart from tens of millions, which is the answer you actually need at this stage.
Pulling one table out of a four-gigabyte file
You almost never want to import the whole dump on the first pass. You want one table, often wp_users or customers, so you can see what columns exist and what the data looks like. The standard mysqldump format ships each table between two predictable comment markers, which is exactly what sed is good at:
gunzip -c dump.sql.gz \
| sed -n '/^-- Table structure for table `wp_users`/,/^-- Table structure for table/p' \
> wp_users.sqlYou now have a tiny file with the schema and all the data for one table, ready to load into a scratch database. The same range works for civicrm_contact, oc_products, anything. The closing pattern (the next -- Table structure marker) is what makes the range reliable across mysqldump versions.
If the previous agency used --skip-comments, the structure markers will not be there. Fall back to /CREATE TABLE `wp_users`/,/UNLOCK TABLES/p instead. Slower to type, but format-agnostic.
The red flags you want to find before import
Once you can read the dump, you can audit it. Four checks we always run:
Character set mismatches. A site migrated three times often carries tables in utf8, utf8mb4, and latin1 at once:
gunzip -c dump.sql.gz | grep -oE "DEFAULT CHARSET=[a-z0-9_]+" | sort -uIf that returns more than one row, expect mojibake. The fix happens at import time with --default-character-set, but you cannot fix what you have not measured.
Storage engines. A surprising number of legacy MySQL dumps still ship MyISAM tables in 2026. Find them:
gunzip -c dump.sql.gz | grep -oE "ENGINE=[A-Za-z]+" | sort | uniq -cMyISAM means no transactions and no foreign keys, which is its own conversation. You want to know before you go live. The MySQL reference manual has the conversion path.
Plaintext secrets. WordPress and Drupal both cache API keys, SMTP passwords, and signed tokens into options or variable tables. Grep before you import, rotate anything you find:
gunzip -c dump.sql.gz \
| grep -aiE "api[_-]?key|secret|smtp_pass|bearer " \
| head -40The -a flag tells grep to treat binary-looking lines as text, which matters when InnoDB has packed something odd into a BLOB column.
Unbounded cache tables. Drupal especially is notorious for shipping cache_* tables with millions of rows of expired session data nobody ever truncated. They bloat the dump, bloat the import, and serve zero purpose on the new stack. A two-second grep "^CREATE TABLE.*cache" surfaces them all.
Watching a long pipe with pv
Once the dump crosses a gigabyte, every command above starts to feel slow. Pipe through pv (Pipe Viewer) and you get a live progress bar:
pv dump.sql.gz | gunzip | grep -c "^INSERT INTO"That is the difference between staring at a frozen terminal and knowing you have 90 seconds left. Install it with brew install pv on macOS, apt install pv on Debian-flavoured Linux. An hour spent reading a dump cold beats a week debugging a half-imported one, every time.
The five-minute audit
When we took on a Drupal 7 to WordPress migration for a Rotterdam logistics client last year, the inherited dump weighed 6.1 GB compressed, but half of that was a cache_form table with eight years of unflushed session payloads we caught with grep and wc before any import ran. That kind of pre-flight pass is part of every legacy migration we run, because a dump you have actually read is a dump you can plan against.
On the next inherited .sql file that lands in your inbox, run file, head -200, and grep -E '^CREATE TABLE' before anything else. You will know more about it in 30 seconds than most teams learn after a four-hour import.
Key takeaway
An hour of reading a dump cold beats a week of debugging a half-imported one. grep, sed, and wc tell you almost everything before the first import.
FAQ
Can I read a .sql.gz without decompressing it first?
Yes. Use zless for paging, or pipe gunzip -c into less or grep. Neither writes a decompressed copy to disk, which matters when the dump is bigger than your free space.
What if the dump is UTF-16 from a Windows export?
file will flag it. Convert with iconv -f UTF-16LE -t UTF-8 before grepping. Most CLI tools assume single-byte or UTF-8 input and will return empty results otherwise.
How do I extract a single row from a multi-row INSERT?
You usually do not. Extract the whole table with sed, load it into a scratch database, then SELECT what you need. Parsing multi-row INSERT syntax with regex is a losing game.
Is mysqldump still the right format in 2026?
For audit work, yes. The text format is grep-friendly and version-portable. For backups and large transfers, the MySQL Shell dump utility is faster and produces parallel-loadable output.