Postgres query running faster, when running on more data

* Update *
After several days, I re tried the query. Suddenly the performance was similar for older date_id’s (using in (X,X) and =X), however, the problem still persists with new date_id’s. I guess this does have something to do with the DB statistics or other maintenance procedures that are occurring behind the scenes.

* Original Post *

I’ve encountered a strange behavior with a Postgres query.
In general, the query is built as described here:

Select
<group by fields>
<aggregation fields>
from
(select *, <some functions on columns> from T1 where <and conditions> and date_id in (X)) T1 join
(select *, <some functions on columns> from T2 where <and conditions> and date_id in (X)) T2
where <join conditions>
group by <group by fields>

This query runs slow on one specific date_id. however if I take more date_ids for the query, it actually runs faster:

Select
<group by fields>
<aggregation fields>
from
(select * from T1 where <and conditions> and date_id in (X,Y)) T1 join
(select * from T2 where <and conditions> and date_id in (X,Y)) T2
where <join conditions>
group by <group by fields>

Even adding the same date_id twice to the “in” clause, makes the query run faster:

Select
<group by fields>
<aggregation fields>
from
(select * from T1 where <and conditions> and date_id in (X,X)) T1 join
(select * from T2 where <and conditions> and date_id in (X,X)) T2
where <join conditions>
group by <group by fields>

This is the execution plan of the initial query – in (x):

GroupAggregate  (cost=68376.20..68376.27 rows=1 width=207)
  ->  Sort  (cost=68376.20..68376.21 rows=1 width=207)
        Sort Key: by_prod.action
        ->  Subquery Scan on by_prod  (cost=68376.15..68376.19 rows=1     width=207)
              ->  HashAggregate  (cost=68376.15..68376.18 rows=1 width=372)
                ->  Hash Join  (cost=421.18..68375.94 rows=1 width=372)
                      Hash Cond: (recommendations.insight_recommendation = insight_recommendations.id)
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
                      ->  Seq Scan on recommendations  (cost=0.00..67949.54 rows=1231 width=74)
                            Filter: (display AND (stores_in_pattern >= 5) AND (dsi_id = 44))
                      ->  Hash  (cost=410.99..410.99 rows=815 width=334)
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on insight_recommendations  (cost=0.42..410.99 rows=815 width=334)
                                  Index Cond: (dsi_id = 44)
                                  Filter: (((length(pattern) - length(replace(pattern, 'Clean'::text, ''::text))) / 5) < ((length(pattern) - length(replace(pattern, 'pattern'::text, ''::text))) / 7))

And this is the execution plan of the last query – in (X,X):

GroupAggregate  (cost=68816.70..68816.77 rows=1 width=207)
  ->  Sort  (cost=68816.70..68816.70 rows=1 width=207)
        Sort Key: by_prod.action
        ->  Subquery Scan on by_prod  (cost=68816.64..68816.69 rows=1     width=207)
              ->  HashAggregate  (cost=68816.64..68816.68 rows=1     width=372)
                ->  Hash Join  (cost=842.20..68816.43 rows=1 width=372)
                      Hash Cond: ((recommendations.insight_recommendation = insight_recommendations.id) AND (recommendations.dsi_id = insight_recommendations.dsi_id))
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > ((selling  + out_of_boundaries)) (...)
                      ->  Seq Scan on recommendations  (cost=0.00..67949.54 rows=2461 width=74)
                            Filter: (display AND (dsi_id = ANY ('{44,44}'::bigint[])) AND (stores_in_pattern >= 5))
                      ->  Hash  (cost=817.75..817.75 rows=1630 width=334)
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on insight_recommendations  (cost=0.42..817.75 rows=1630 width=334)
                                  Index Cond: (dsi_id = ANY ('{44,44}'::bigint[]))
                                  Filter: (((length(pattern) - length(replace(pattern, 'Clean'::text, ''::text))) / 5) < ((length(pattern) - length(replace(pattern, 'pattern'::text, ''::text))) / 7))

Any ideas on why this is happening, and how I can prevent the query from running slower on specific date_ids?

Thank you.

Update:
Here is the ‘explain (analyze, verbose)’ results:

Initial query:

GroupAggregate  (cost=68376.20..68376.27 rows=1 width=207) (actual         time=520.264..522.015 rows=2 loops=1)
  Output: by_prod.action, count(1), avg(by_prod.cnt),         avg(by_prod.avg_rev), (...)
  ->  Sort  (cost=68376.20..68376.21 rows=1 width=207) (actual             time=520.208..520.263 rows=433 loops=1)
        Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
        Sort Key: by_prod.action
        Sort Method: quicksort  Memory: 85kB
         ->  Subquery Scan on by_prod  (cost=68376.15..68376.19 rows=1     width=207) (actual time=518.202..519.523 rows=433 loops=1)
              Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
              ->  HashAggregate  (cost=68376.15..68376.18 rows=1             width=372) (actual time=518.201..519.425 rows=433 loops=1)
                    Output: recommendations.action,                         recommendations.product_key,     recommendations.store_identifier, count(1),     (...)
                    ->  Hash Join  (cost=421.18..68375.94 rows=1             width=372) (actual time=216.055..507.982 rows=433 loops=1)
                          Output: recommendations.action,                 recommendations.product_key,             recommendations.store_identifier,      (...)
                          Hash Cond:                                     (recommendations.insight_recommendation = insight_recommendations.id)
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
                      Rows Removed by Join Filter: 1433
                      ->  Seq Scan on public.recommendations  (cost=0.00..67949.54 rows=1231 width=74) (actual time=159.885..482.251 rows=2468 loops=1)
                            Output: recommendations.id, recommendations.action, (...)
                            Filter: (recommendations.display AND (recommendations.stores_in_pattern >= 5) AND  (...)
                            Rows Removed by Filter: 501642
                      ->  Hash  (cost=410.99..410.99 rows=815 width=334) (actual time=19.431..19.431 rows=1839 loops=1)
                            Output: insight_recommendations.out_of_boundaries, insight_recommendations.pattern, insight_recommendations.id, insight_recommendations.dsi_id
                            Buckets: 1024  Batches: 1  Memory Usage: 477kB
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on public.insight_recommendations  (cost=0.42..410.99 rows=815 width=334) (actual time=0.196..17.917 rows=1839 loops=1)
                                  Output: insight_recommendations.out_of_boundaries, (...)
                                  Index Cond: (insight_recommendations.dsi_id = 44)
                                  Filter: (((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pattern, 'Clean'::text, ''::text))) / 5) < ((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pa (...)
                                  Rows Removed by Filter: 480

This is the updated query:

GroupAggregate  (cost=68816.70..68816.77 rows=1 width=207) (actual         time=279.614..281.358 rows=2 loops=1)
  Output: by_prod.action, count(1), avg(by_prod.cnt),                     avg(by_prod.avg_rev), (...)
  ->  Sort  (cost=68816.70..68816.70 rows=1 width=207) (actual             time=279.562..279.598 rows=433 loops=1)
        Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
    Sort Key: by_prod.action
    Sort Method: quicksort  Memory: 85kB
    ->  Subquery Scan on by_prod  (cost=68816.64..68816.69 rows=1 width=207) (actual time=277.514..278.891 rows=433 loops=1)
          Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
          ->  HashAggregate  (cost=68816.64..68816.68 rows=1 width=372) (actual time=277.512..278.796 rows=433 loops=1)
                Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, count(1),  (...)
                ->  Hash Join  (cost=842.20..68816.43 rows=1 width=372) (actual time=108.400..267.563 rows=433 loops=1)
                      Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, (...)
                      Hash Cond: ((recommendations.insight_recommendation = insight_recommendations.id) AND (recommendations.dsi_id = insight_recommendations.dsi_id))
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
                      Rows Removed by Join Filter: 1433
                      ->  Seq Scan on public.recommendations  (cost=0.00..67949.54 rows=2461 width=74) (actual time=73.587..243.185 rows=2468 loops=1)
                            Output: recommendations.id, recommendations.action,  (...)
                            Filter: (recommendations.display AND (recommendations.dsi_id = ANY ('{44,44}'::bigint[])) AND (recommendations.stores_in_pattern >= 5) (...)
                            Rows Removed by Filter: 501642
                      ->  Hash  (cost=817.75..817.75 rows=1630 width=334) (actual time=18.464..18.464 rows=1839 loops=1)
                            Output: insight_recommendations.out_of_boundaries, (...)
                            Buckets: 1024  Batches: 1  Memory Usage: 477kB
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on public.insight_recommendations  (cost=0.42..817.75 rows=1630 width=334) (actual time=0.048..17.134 rows=1839 loops=1)
                                  Output: insight_recommendations.out_of_boundaries, (...)
                                  Index Cond: (insight_recommendations.dsi_id = ANY ('{44,44}'::bigint[]))
                                  Filter: (((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pattern, 'Clean'::text, ''::text))) / 5) < ((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pa (...)
                                  Rows Removed by Filter: 480

Postgres query running faster, when running on more data

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List