I have two similar queries, whose only difference (in the last two lines) is that one uses IS NULL while the other compares defined values.
Query #1 takes only a few minutes:
UPDATE staging_localized_places AS lcity
SET state = lstate.name, country = lcountry.name
FROM staging_places AS city, staging_localized_places AS lstate, staging_localized_places AS lcountry
WHERE lcity.city_id = city.id
AND lstate.city_id = city.state_geonameid AND lstate.language_code = lcity.language_code
AND lcountry.city_id = city.country_geonameid AND lcountry.language_code = lcity.language_code
While query #2 takes a lot of time (probably hours):
UPDATE staging_localized_places AS lcity
SET state = lstate.name, country = lcountry.name
FROM staging_places AS city, staging_localized_places AS lstate, staging_localized_places AS lcountry
WHERE lcity.city_id = city.id
AND lstate.city_id = city.state_geonameid AND lstate.language_code IS NULL AND lcity.language_code IS NULL
AND lcountry.city_id = city.country_geonameid AND lcountry.language_code IS NULL AND lcity.language_code IS NULL
On language_code
I’ve already defined the following indexes with no success:
CREATE INDEX index_localized_cities_on_language_code ON staging_localized_places (language_code)
CREATE INDEX index_localized_cities_on_null_language_code ON staging_localized_places (language_code) WHERE language_code IS NULL
Here’s the query plan for query #2 (slow):
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Update on staging_localized_places lcity (cost=3325699.06..3573882.49 rows=3917199 width=69)
-> Hash Join (cost=3325699.06..3573882.49 rows=3917199 width=69)
Hash Cond: (lcity.city_id = city.id)
-> Seq Scan on staging_localized_places lcity (cost=0.00..73331.17 rows=3759221 width=29)
Filter: ((language_code IS NULL) AND (language_code IS NULL))
-> Hash (cost=3235364.79..3235364.79 rows=4243222 width=44)
-> Merge Join (cost=3145905.93..3235364.79 rows=4243222 width=44)
Merge Cond: (city.country_geonameid = lcountry.city_id)
-> Sort (cost=2376993.75..2387173.58 rows=4071931 width=31)
Sort Key: city.country_geonameid
-> Merge Join (cost=1547044.44..1637673.07 rows=4071931 width=31)
Merge Cond: (lstate.city_id = city.state_geonameid)
-> Sort (cost=768809.17..778959.72 rows=4060219 width=21)
Sort Key: lstate.city_id
-> Seq Scan on staging_localized_places lstate (cost=0.00..73331.17 rows=4060219 width=21)
Filter: (language_code IS NULL)
-> Materialize (cost=778132.82..798493.29 rows=4072095 width=18)
-> Sort (cost=778132.82..788313.06 rows=4072095 width=18)
Sort Key: city.state_geonameid
-> Seq Scan on staging_places city (cost=0.00..80540.95 rows=4072095 width=18)
-> Materialize (cost=768809.17..789110.26 rows=4060219 width=21)
-> Sort (cost=768809.17..778959.72 rows=4060219 width=21)
Sort Key: lcountry.city_id
-> Seq Scan on staging_localized_places lcountry (cost=0.00..73331.17 rows=4060219 width=21)
Filter: (language_code IS NULL)
(25 rows)
And here’s the query plan for the query #1 (fast):
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Update on staging_localized_places lcity (cost=1801858.38..2214213.70 rows=110 width=69)
-> Nested Loop (cost=1801858.38..2214213.70 rows=110 width=69)
Join Filter: ((lstate.language_code)::text = (lcountry.language_code)::text)
-> Hash Join (cost=1801858.38..2035500.71 rows=21540 width=59)
Hash Cond: ((lcity.city_id = city.id) AND ((lcity.language_code)::text = (lstate.language_code)::text))
-> Seq Scan on staging_localized_places lcity (cost=0.00..73331.17 rows=4385317 width=29)
-> Hash (cost=1701528.88..1701528.88 rows=4397967 width=34)
-> Merge Join (cost=1605176.45..1701528.88 rows=4397967 width=34)
Merge Cond: (city.state_geonameid = lstate.city_id)
-> Sort (cost=778132.82..788313.06 rows=4072095 width=18)
Sort Key: city.state_geonameid
-> Seq Scan on staging_places city (cost=0.00..80540.95 rows=4072095 width=18)
-> Materialize (cost=826932.82..848859.40 rows=4385317 width=24)
-> Sort (cost=826932.82..837896.11 rows=4385317 width=24)
Sort Key: lstate.city_id
-> Seq Scan on staging_localized_places lstate (cost=0.00..73331.17 rows=4385317 width=24)
-> Index Scan using "1432037133_index_localized_cities_on_city_id" on staging_localized_places lcountry (cost=0.00..8.28 rows=1 width=24)
Index Cond: (city_id = city.country_geonameid)
(18 rows)