* Update *
After several days, I re tried the query. Suddenly the performance was similar for older date_id’s (using in (X,X) and =X), however, the problem still persists with new date_id’s. I guess this does have something to do with the DB statistics or other maintenance procedures that are occurring behind the scenes.
* Original Post *
I’ve encountered a strange behavior with a Postgres query.
In general, the query is built as described here:
Select
<group by fields>
<aggregation fields>
from
(select *, <some functions on columns> from T1 where <and conditions> and date_id in (X)) T1 join
(select *, <some functions on columns> from T2 where <and conditions> and date_id in (X)) T2
where <join conditions>
group by <group by fields>
This query runs slow on one specific date_id. however if I take more date_ids for the query, it actually runs faster:
Select
<group by fields>
<aggregation fields>
from
(select * from T1 where <and conditions> and date_id in (X,Y)) T1 join
(select * from T2 where <and conditions> and date_id in (X,Y)) T2
where <join conditions>
group by <group by fields>
Even adding the same date_id twice to the “in” clause, makes the query run faster:
Select
<group by fields>
<aggregation fields>
from
(select * from T1 where <and conditions> and date_id in (X,X)) T1 join
(select * from T2 where <and conditions> and date_id in (X,X)) T2
where <join conditions>
group by <group by fields>
This is the execution plan of the initial query – in (x):
GroupAggregate (cost=68376.20..68376.27 rows=1 width=207)
-> Sort (cost=68376.20..68376.21 rows=1 width=207)
Sort Key: by_prod.action
-> Subquery Scan on by_prod (cost=68376.15..68376.19 rows=1 width=207)
-> HashAggregate (cost=68376.15..68376.18 rows=1 width=372)
-> Hash Join (cost=421.18..68375.94 rows=1 width=372)
Hash Cond: (recommendations.insight_recommendation = insight_recommendations.id)
Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
-> Seq Scan on recommendations (cost=0.00..67949.54 rows=1231 width=74)
Filter: (display AND (stores_in_pattern >= 5) AND (dsi_id = 44))
-> Hash (cost=410.99..410.99 rows=815 width=334)
-> Index Scan using uk_488tmd4srok45l5lda23pxmd3 on insight_recommendations (cost=0.42..410.99 rows=815 width=334)
Index Cond: (dsi_id = 44)
Filter: (((length(pattern) - length(replace(pattern, 'Clean'::text, ''::text))) / 5) < ((length(pattern) - length(replace(pattern, 'pattern'::text, ''::text))) / 7))
And this is the execution plan of the last query – in (X,X):
GroupAggregate (cost=68816.70..68816.77 rows=1 width=207)
-> Sort (cost=68816.70..68816.70 rows=1 width=207)
Sort Key: by_prod.action
-> Subquery Scan on by_prod (cost=68816.64..68816.69 rows=1 width=207)
-> HashAggregate (cost=68816.64..68816.68 rows=1 width=372)
-> Hash Join (cost=842.20..68816.43 rows=1 width=372)
Hash Cond: ((recommendations.insight_recommendation = insight_recommendations.id) AND (recommendations.dsi_id = insight_recommendations.dsi_id))
Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > ((selling + out_of_boundaries)) (...)
-> Seq Scan on recommendations (cost=0.00..67949.54 rows=2461 width=74)
Filter: (display AND (dsi_id = ANY ('{44,44}'::bigint[])) AND (stores_in_pattern >= 5))
-> Hash (cost=817.75..817.75 rows=1630 width=334)
-> Index Scan using uk_488tmd4srok45l5lda23pxmd3 on insight_recommendations (cost=0.42..817.75 rows=1630 width=334)
Index Cond: (dsi_id = ANY ('{44,44}'::bigint[]))
Filter: (((length(pattern) - length(replace(pattern, 'Clean'::text, ''::text))) / 5) < ((length(pattern) - length(replace(pattern, 'pattern'::text, ''::text))) / 7))
Any ideas on why this is happening, and how I can prevent the query from running slower on specific date_ids?
Thank you.
Update:
Here is the ‘explain (analyze, verbose)’ results:
Initial query:
GroupAggregate (cost=68376.20..68376.27 rows=1 width=207) (actual time=520.264..522.015 rows=2 loops=1)
Output: by_prod.action, count(1), avg(by_prod.cnt), avg(by_prod.avg_rev), (...)
-> Sort (cost=68376.20..68376.21 rows=1 width=207) (actual time=520.208..520.263 rows=433 loops=1)
Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
Sort Key: by_prod.action
Sort Method: quicksort Memory: 85kB
-> Subquery Scan on by_prod (cost=68376.15..68376.19 rows=1 width=207) (actual time=518.202..519.523 rows=433 loops=1)
Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
-> HashAggregate (cost=68376.15..68376.18 rows=1 width=372) (actual time=518.201..519.425 rows=433 loops=1)
Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, count(1), (...)
-> Hash Join (cost=421.18..68375.94 rows=1 width=372) (actual time=216.055..507.982 rows=433 loops=1)
Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, (...)
Hash Cond: (recommendations.insight_recommendation = insight_recommendations.id)
Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
Rows Removed by Join Filter: 1433
-> Seq Scan on public.recommendations (cost=0.00..67949.54 rows=1231 width=74) (actual time=159.885..482.251 rows=2468 loops=1)
Output: recommendations.id, recommendations.action, (...)
Filter: (recommendations.display AND (recommendations.stores_in_pattern >= 5) AND (...)
Rows Removed by Filter: 501642
-> Hash (cost=410.99..410.99 rows=815 width=334) (actual time=19.431..19.431 rows=1839 loops=1)
Output: insight_recommendations.out_of_boundaries, insight_recommendations.pattern, insight_recommendations.id, insight_recommendations.dsi_id
Buckets: 1024 Batches: 1 Memory Usage: 477kB
-> Index Scan using uk_488tmd4srok45l5lda23pxmd3 on public.insight_recommendations (cost=0.42..410.99 rows=815 width=334) (actual time=0.196..17.917 rows=1839 loops=1)
Output: insight_recommendations.out_of_boundaries, (...)
Index Cond: (insight_recommendations.dsi_id = 44)
Filter: (((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pattern, 'Clean'::text, ''::text))) / 5) < ((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pa (...)
Rows Removed by Filter: 480
This is the updated query:
GroupAggregate (cost=68816.70..68816.77 rows=1 width=207) (actual time=279.614..281.358 rows=2 loops=1)
Output: by_prod.action, count(1), avg(by_prod.cnt), avg(by_prod.avg_rev), (...)
-> Sort (cost=68816.70..68816.70 rows=1 width=207) (actual time=279.562..279.598 rows=433 loops=1)
Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
Sort Key: by_prod.action
Sort Method: quicksort Memory: 85kB
-> Subquery Scan on by_prod (cost=68816.64..68816.69 rows=1 width=207) (actual time=277.514..278.891 rows=433 loops=1)
Output: by_prod.action, by_prod.cnt, by_prod.avg_rev, (...)
-> HashAggregate (cost=68816.64..68816.68 rows=1 width=372) (actual time=277.512..278.796 rows=433 loops=1)
Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, count(1), (...)
-> Hash Join (cost=842.20..68816.43 rows=1 width=372) (actual time=108.400..267.563 rows=433 loops=1)
Output: recommendations.action, recommendations.product_key, recommendations.store_identifier, (...)
Hash Cond: ((recommendations.insight_recommendation = insight_recommendations.id) AND (recommendations.dsi_id = insight_recommendations.dsi_id))
Join Filter: (((((recommendations.action)::text = 'IMPACT'::text) AND ((patterns + action_items) > (selling + out_of_boundaries)) (...)
Rows Removed by Join Filter: 1433
-> Seq Scan on public.recommendations (cost=0.00..67949.54 rows=2461 width=74) (actual time=73.587..243.185 rows=2468 loops=1)
Output: recommendations.id, recommendations.action, (...)
Filter: (recommendations.display AND (recommendations.dsi_id = ANY ('{44,44}'::bigint[])) AND (recommendations.stores_in_pattern >= 5) (...)
Rows Removed by Filter: 501642
-> Hash (cost=817.75..817.75 rows=1630 width=334) (actual time=18.464..18.464 rows=1839 loops=1)
Output: insight_recommendations.out_of_boundaries, (...)
Buckets: 1024 Batches: 1 Memory Usage: 477kB
-> Index Scan using uk_488tmd4srok45l5lda23pxmd3 on public.insight_recommendations (cost=0.42..817.75 rows=1630 width=334) (actual time=0.048..17.134 rows=1839 loops=1)
Output: insight_recommendations.out_of_boundaries, (...)
Index Cond: (insight_recommendations.dsi_id = ANY ('{44,44}'::bigint[]))
Filter: (((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pattern, 'Clean'::text, ''::text))) / 5) < ((length(insight_recommendations.pattern) - length(replace(insight_recommendations.pa (...)
Rows Removed by Filter: 480