Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Postgres query running faster, when running on more data

$
0
0

* Update *
After several days, I re tried the query. Suddenly the performance was similar for older date_id’s (using in (X,X) and =X), however, the problem still persists with new date_id’s. I guess this does have something to do with the DB statistics or other maintenance procedures that are occurring behind the scenes.

* Original Post *

I’ve encountered a strange behavior with a Postgres query.
In general, the query is built as described here:

Select
<group by fields>
<aggregation fields>
from
(select *, <some functions on columns> from T1 where <and conditions> and date_id in (X)) T1 join
(select *, <some functions on columns> from T2 where <and conditions> and date_id in (X)) T2
where <join conditions>
group by <group by fields>

This query runs slow on one specific date_id. however if I take more date_ids for the query, it actually runs faster:

Select
<group by fields>
<aggregation fields>
from
(select * from T1 where <and conditions> and date_id in (X,Y)) T1 join
(select * from T2 where <and conditions> and date_id in (X,Y)) T2
where <join conditions>
group by <group by fields>

Even adding the same date_id twice to the “in” clause, makes the query run faster:

Select
<group by fields>
<aggregation fields>
from
(select * from T1 where <and conditions> and date_id in (X,X)) T1 join
(select * from T2 where <and conditions> and date_id in (X,X)) T2
where <join conditions>
group by <group by fields>

This is the execution plan of the initial query – in (x):

GroupAggregate  (cost=68376.20..68376.27 rows=1 width=207)
  ->  Sort  (cost=68376.20..68376.21 rows=1 width=207)
        Sort Key: by_prod.action
        ->  Subquery Scan on by_prod  (cost=68376.15..68376.19 rows=1     width=207)
              ->  HashAggregate  (cost=68376.15..68376.18 rows=1 width=372)
                ->  Hash Join  (cost=421.18..68375.94 rows=1 width=372)
                      Hash Cond: (recommendations.insight_recommendation = insight_recommendations.id)
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
                      ->  Seq Scan on recommendations  (cost=0.00..67949.54 rows=1231 width=74)
                            Filter: (display AND (stores_in_pattern >= 5) AND (dsi_id = 44))
                      ->  Hash  (cost=410.99..410.99 rows=815 width=334)
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on insight_recommendations  (cost=0.42..410.99 rows=815 width=334)
                                  Index Cond: (dsi_id = 44)
                                  Filter: (((length(pattern) - length(replace(pattern, 'Clean'::text, ''::text))) / 5) < ((length(pattern) - length(replace(pattern, 'pattern'::text, ''::text))) / 7))

And this is the execution plan of the last query – in (X,X):

GroupAggregate  (cost=68816.70..68816.77 rows=1 width=207)
  ->  Sort  (cost=68816.70..68816.70 rows=1 width=207)
        Sort Key: by_prod.action
        ->  Subquery Scan on by_prod  (cost=68816.64..68816.69 rows=1     width=207)
              ->  HashAggregate  (cost=68816.64..68816.68 rows=1     width=372)
                ->  Hash Join  (cost=842.20..68816.43 rows=1 width=372)
                      Hash Cond: ((recommendations.insight_recommendation = insight_recommendations.id) AND (recommendations.dsi_id = insight_recommendations.dsi_id))
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > ((selling  + out_of_boundaries)) (...)
                      ->  Seq Scan on recommendations  (cost=0.00..67949.54 rows=2461 width=74)
                            Filter: (display AND (dsi_id = ANY ('{44,44}'::bigint[])) AND (stores_in_pattern >= 5))
                      ->  Hash  (cost=817.75..817.75 rows=1630 width=334)
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on insight_recommendations  (cost=0.42..817.75 rows=1630 width=334)
                                  Index Cond: (dsi_id = ANY ('{44,44}'::bigint[]))
                                  Filter: (((length(pattern) - length(replace(pattern, 'Clean'::text, ''::text))) / 5) < ((length(pattern) - length(replace(pattern, 'pattern'::text, ''::text))) / 7))

Any ideas on why this is happening, and how I can prevent the query from running slower on specific date_ids?

Thank you.

Update:
Here is the ‘explain (analyze, verbose)’ results:

Initial query:

GroupAggregate  (cost=68376.20..68376.27 rows=1 width=207) (actual         time=520.264..522.015 rows=2 loops=1)
  Output: by_prod.action, count(1), avg(by_prod.cnt),         avg(by_prod.avg_rev), (...)
  ->  Sort  (cost=68376.20..68376.21 rows=1 width=207) (actual             time=520.208..520.263 rows=433 loops=1)
        Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
        Sort Key: by_prod.action
        Sort Method: quicksort  Memory: 85kB
         ->  Subquery Scan on by_prod  (cost=68376.15..68376.19 rows=1     width=207) (actual time=518.202..519.523 rows=433 loops=1)
              Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
              ->  HashAggregate  (cost=68376.15..68376.18 rows=1             width=372) (actual time=518.201..519.425 rows=433 loops=1)
                    Output: recommendations.action,                         recommendations.product_key,     recommendations.store_identifier, count(1),     (...)
                    ->  Hash Join  (cost=421.18..68375.94 rows=1             width=372) (actual time=216.055..507.982 rows=433 loops=1)
                          Output: recommendations.action,                 recommendations.product_key,             recommendations.store_identifier,      (...)
                          Hash Cond:                                     (recommendations.insight_recommendation = insight_recommendations.id)
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
                      Rows Removed by Join Filter: 1433
                      ->  Seq Scan on public.recommendations  (cost=0.00..67949.54 rows=1231 width=74) (actual time=159.885..482.251 rows=2468 loops=1)
                            Output: recommendations.id, recommendations.action, (...)
                            Filter: (recommendations.display AND (recommendations.stores_in_pattern >= 5) AND  (...)
                            Rows Removed by Filter: 501642
                      ->  Hash  (cost=410.99..410.99 rows=815 width=334) (actual time=19.431..19.431 rows=1839 loops=1)
                            Output: insight_recommendations.out_of_boundaries, insight_recommendations.pattern, insight_recommendations.id, insight_recommendations.dsi_id
                            Buckets: 1024  Batches: 1  Memory Usage: 477kB
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on public.insight_recommendations  (cost=0.42..410.99 rows=815 width=334) (actual time=0.196..17.917 rows=1839 loops=1)
                                  Output: insight_recommendations.out_of_boundaries, (...)
                                  Index Cond: (insight_recommendations.dsi_id = 44)
                                  Filter: (((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pattern, 'Clean'::text, ''::text))) / 5) < ((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pa (...)
                                  Rows Removed by Filter: 480

This is the updated query:

GroupAggregate  (cost=68816.70..68816.77 rows=1 width=207) (actual         time=279.614..281.358 rows=2 loops=1)
  Output: by_prod.action, count(1), avg(by_prod.cnt),                     avg(by_prod.avg_rev), (...)
  ->  Sort  (cost=68816.70..68816.70 rows=1 width=207) (actual             time=279.562..279.598 rows=433 loops=1)
        Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
    Sort Key: by_prod.action
    Sort Method: quicksort  Memory: 85kB
    ->  Subquery Scan on by_prod  (cost=68816.64..68816.69 rows=1 width=207) (actual time=277.514..278.891 rows=433 loops=1)
          Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
          ->  HashAggregate  (cost=68816.64..68816.68 rows=1 width=372) (actual time=277.512..278.796 rows=433 loops=1)
                Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, count(1),  (...)
                ->  Hash Join  (cost=842.20..68816.43 rows=1 width=372) (actual time=108.400..267.563 rows=433 loops=1)
                      Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, (...)
                      Hash Cond: ((recommendations.insight_recommendation = insight_recommendations.id) AND (recommendations.dsi_id = insight_recommendations.dsi_id))
                      Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
                      Rows Removed by Join Filter: 1433
                      ->  Seq Scan on public.recommendations  (cost=0.00..67949.54 rows=2461 width=74) (actual time=73.587..243.185 rows=2468 loops=1)
                            Output: recommendations.id, recommendations.action,  (...)
                            Filter: (recommendations.display AND (recommendations.dsi_id = ANY ('{44,44}'::bigint[])) AND (recommendations.stores_in_pattern >= 5) (...)
                            Rows Removed by Filter: 501642
                      ->  Hash  (cost=817.75..817.75 rows=1630 width=334) (actual time=18.464..18.464 rows=1839 loops=1)
                            Output: insight_recommendations.out_of_boundaries, (...)
                            Buckets: 1024  Batches: 1  Memory Usage: 477kB
                            ->  Index Scan using uk_488tmd4srok45l5lda23pxmd3 on public.insight_recommendations  (cost=0.42..817.75 rows=1630 width=334) (actual time=0.048..17.134 rows=1839 loops=1)
                                  Output: insight_recommendations.out_of_boundaries, (...)
                                  Index Cond: (insight_recommendations.dsi_id = ANY ('{44,44}'::bigint[]))
                                  Filter: (((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pattern, 'Clean'::text, ''::text))) / 5) < ((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pa (...)
                                  Rows Removed by Filter: 480

Viewing all articles
Browse latest Browse all 1138

Trending Articles