I’m using PostgreSQL to host some data that comes from external source, the normal operation of the database is read-only with periodic updates. Updates though are kinda huge e.g. in a 50 million row table 10 million rows will be updated and 1 million inserted. Inserting via COPY
is fast enough but updating takes a very long time. Since the database can be easily recreated what could be done to increase update performance (the query is basically UPDATE items SET name = items_import.name FROM items_import WHERE items.id = items_import.id
) by reducing reliability?
Query plan, nothing unusual here (both tables have primary key)
QUERY PLAN
-------------------------------------------------------------------------------------------------
Update on items_import_full i (cost=487140.65..5258198.17 rows=8124429 width=210)
-> Hash Join (cost=487140.65..5258198.17 rows=8124429 width=210)
Hash Cond: (i.id = ii.id)
-> Seq Scan on items_import_full i (cost=0.00..1753813.88 rows=45392988 width=178)
-> Hash (cost=322112.29..322112.29 rows=8124429 width=36)
-> Seq Scan on items_import_inc ii (cost=0.00..322112.29 rows=8124429 width=36)