Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Restoring SQL_ASCII dumps to a UTF8-encoded database

$
0
0

I’ve got a Postgres 8.4 environment where the encoding on all our databases is set to SQL_ASCII – We’re finally migrating to Postgres 9.2, and I would like to migrate everything over to UTF8 encoding.

Unfortunately the text data in this DB is not clean — Trying to restore the pg_dump to a utf8-encoded database throws errors about invalid byte sequences, even if I specify --encoding=UTF8 when I run pg_dump (presumably because Postgres doesn’t know what to make of them and just dumps them unchanged?).

We have a LOT of data (upwards of a million rows with text/string elements), and auditing all of it by hand would be very time consuming (and error prone) so I’d like to automate this if possible.

Is there an easy way to find the non-utf8-conforming strings/text fields in the database so we can fix them? Or am I stuck with a manual audit to straighten this mess out?


Viewing all articles
Browse latest Browse all 1138

Trending Articles