I’m using PostgreSQL 9.3.5 on a CentOS system. I have a very large table (twenty million rows, 50+ column) that aggregates data from a few hundred systems. Several times a day each of the systems sends me update for that table, and one of those updates is a list of ids for which I need to set a new value in one column. I’m doing that by putting the list of ids in a smaller table and using a join. Most of the time this is pretty fast (under 1 second), but about 10% of the time the update will take anywhere from 3-180 seconds. I’m trying to figure out how to reduce or eliminate those outliers.
remotedata is the very large table; valid_remotedata is the one used for updates.
Table "remotedata"
Column | Type | Modifiers
------------------+-----------------------------+---------------------------------------------------------------------
system_id | character varying(32) | not null default ''::character varying
id | bigint | not null default 0::bigint
(50 more columns of varying types)
expire_confirmed | character(1) | default 'f'::bpchar
Indexes:
"remotedata_pkey" PRIMARY KEY, btree (system_id, id)
Table "valid_remotedata"
Column | Type | Modifiers
------------+-----------------------+-----------
system_id | character varying(32) | not null
id | bigint | not null
Indexes:
"valid_remotedata_pkey" PRIMARY KEY, btree (system_id, id)
"valid_remote_system" btree (system_id)
Update statement:
UPDATE remotedata r SET expire_confirmed='t'
FROM valid_remotedata vr
WHERE vr.id=r.id and vr.system_id=r.system_id
AND vr.system_id='223344';
explain analyze output:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Update on remotedata r (cost=353.49..25563.39 rows=11 width=1701) (actual time=8861.469..8861.469 rows=0 loops=1)
-> Nested Loop (cost=353.49..25563.39 rows=11 width=1701) (actual time=15.734..1638.006 rows=71233 loops=1)
-> Bitmap Heap Scan on valid_remotedata vr (cost=352.93..7430.16 rows=2113 width=27) (actual time=15.684..54.007 rows=71233 loops=1)
Recheck Cond: ((system_id)::text = '223344'::text)
-> Bitmap Index Scan on valid_remote_system (cost=0.00..352.40 rows=2113 width=0) (actual time=15.585..15.585 rows=71233 loops=1)
Index Cond: ((system_id)::text = '223344'::text)
-> Index Scan using remotedata_pkey on remotedata r (cost=0.56..8.57 rows=1 width=1695) (actual time=0.020..0.020 rows=1 loops=71233)
Index Cond: (((system_id)::text = '223344'::text) AND (id = vb.id))
Total runtime: 8861.651 ms