Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Postgresql indexing for IS NULL

$
0
0

I have two similar queries, whose only difference (in the last two lines) is that one uses IS NULL while the other compares defined values.

Query #1 takes only a few minutes:

UPDATE staging_localized_places AS lcity 
  SET state = lstate.name, country = lcountry.name
  FROM staging_places AS city, staging_localized_places AS lstate, staging_localized_places AS lcountry 
  WHERE lcity.city_id = city.id
  AND lstate.city_id = city.state_geonameid AND lstate.language_code = lcity.language_code
  AND lcountry.city_id = city.country_geonameid AND lcountry.language_code = lcity.language_code

While query #2 takes a lot of time (probably hours):

UPDATE staging_localized_places AS lcity 
  SET state = lstate.name, country = lcountry.name
  FROM staging_places AS city, staging_localized_places AS lstate, staging_localized_places AS lcountry 
  WHERE lcity.city_id = city.id
  AND lstate.city_id = city.state_geonameid AND lstate.language_code IS NULL AND lcity.language_code IS NULL
  AND lcountry.city_id = city.country_geonameid AND lcountry.language_code IS NULL AND lcity.language_code IS NULL

On language_code I’ve already defined the following indexes with no success:

CREATE INDEX index_localized_cities_on_language_code ON staging_localized_places (language_code)
CREATE INDEX index_localized_cities_on_null_language_code ON staging_localized_places (language_code) WHERE language_code IS NULL

Here’s the query plan for query #2 (slow):

 QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Update on staging_localized_places lcity  (cost=3325699.06..3573882.49 rows=3917199 width=69)
   ->  Hash Join  (cost=3325699.06..3573882.49 rows=3917199 width=69)
         Hash Cond: (lcity.city_id = city.id)
         ->  Seq Scan on staging_localized_places lcity  (cost=0.00..73331.17 rows=3759221 width=29)
               Filter: ((language_code IS NULL) AND (language_code IS NULL))
         ->  Hash  (cost=3235364.79..3235364.79 rows=4243222 width=44)
               ->  Merge Join  (cost=3145905.93..3235364.79 rows=4243222 width=44)
                     Merge Cond: (city.country_geonameid = lcountry.city_id)
                     ->  Sort  (cost=2376993.75..2387173.58 rows=4071931 width=31)
                           Sort Key: city.country_geonameid
                           ->  Merge Join  (cost=1547044.44..1637673.07 rows=4071931 width=31)
                                 Merge Cond: (lstate.city_id = city.state_geonameid)
                                 ->  Sort  (cost=768809.17..778959.72 rows=4060219 width=21)
                                       Sort Key: lstate.city_id
                                       ->  Seq Scan on staging_localized_places lstate  (cost=0.00..73331.17 rows=4060219 width=21)
                                             Filter: (language_code IS NULL)
                                 ->  Materialize  (cost=778132.82..798493.29 rows=4072095 width=18)
                                       ->  Sort  (cost=778132.82..788313.06 rows=4072095 width=18)
                                             Sort Key: city.state_geonameid
                                             ->  Seq Scan on staging_places city  (cost=0.00..80540.95 rows=4072095 width=18)
                     ->  Materialize  (cost=768809.17..789110.26 rows=4060219 width=21)
                           ->  Sort  (cost=768809.17..778959.72 rows=4060219 width=21)
                                 Sort Key: lcountry.city_id
                                 ->  Seq Scan on staging_localized_places lcountry  (cost=0.00..73331.17 rows=4060219 width=21)
                                       Filter: (language_code IS NULL)
(25 rows)

And here’s the query plan for the query #1 (fast):

QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Update on staging_localized_places lcity  (cost=1801858.38..2214213.70 rows=110 width=69)
   ->  Nested Loop  (cost=1801858.38..2214213.70 rows=110 width=69)
         Join Filter: ((lstate.language_code)::text = (lcountry.language_code)::text)
         ->  Hash Join  (cost=1801858.38..2035500.71 rows=21540 width=59)
               Hash Cond: ((lcity.city_id = city.id) AND ((lcity.language_code)::text = (lstate.language_code)::text))
               ->  Seq Scan on staging_localized_places lcity  (cost=0.00..73331.17 rows=4385317 width=29)
               ->  Hash  (cost=1701528.88..1701528.88 rows=4397967 width=34)
                     ->  Merge Join  (cost=1605176.45..1701528.88 rows=4397967 width=34)
                           Merge Cond: (city.state_geonameid = lstate.city_id)
                           ->  Sort  (cost=778132.82..788313.06 rows=4072095 width=18)
                                 Sort Key: city.state_geonameid
                                 ->  Seq Scan on staging_places city  (cost=0.00..80540.95 rows=4072095 width=18)
                           ->  Materialize  (cost=826932.82..848859.40 rows=4385317 width=24)
                                 ->  Sort  (cost=826932.82..837896.11 rows=4385317 width=24)
                                       Sort Key: lstate.city_id
                                       ->  Seq Scan on staging_localized_places lstate  (cost=0.00..73331.17 rows=4385317 width=24)
         ->  Index Scan using "1432037133_index_localized_cities_on_city_id" on staging_localized_places lcountry  (cost=0.00..8.28 rows=1 width=24)
               Index Cond: (city_id = city.country_geonameid)
(18 rows)

Viewing all articles
Browse latest Browse all 1138

Trending Articles