Query:
with customer_total_return as
(select wr_returning_customer_sk as ctr_customer_sk
,ca_state as ctr_state,
sum(wr_return_amt) as ctr_total_return
from web_returns
,date_dim
,customer_address
where wr_returned_date_sk = d_date_sk
and d_year =2002
and wr_returning_addr_sk = ca_address_sk
group by wr_returning_customer_sk
,ca_state)
select c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
,c_last_review_date,ctr_total_return
from customer_total_return ctr1
,customer_address
,customer
where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
from customer_total_return ctr2
where ctr1.ctr_state = ctr2.ctr_state)
and ca_address_sk = c_current_addr_sk
and ca_state = 'IL'
and ctr1.ctr_customer_sk = c_customer_sk
order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
,c_last_review_date,ctr_total_return
limit 100;
I have indexes created on :
wr_returning_customer_sk
wr_returned_date_sk
d_date_sk
ca_address_sk
wr_returning_addr_sk
ca_address_sk
ca_state, ca_country
c_current_addr_sk
c_customer_sk
EXPLAIN RESULT with enable_nestloop=on
;
http://explain.depesz.com/s/F0d
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=7293.09..7293.10 rows=3 width=253) (actual time=16112.713..16112.723 rows=92 loops=1)
CTE customer_total_return
-> HashAggregate (cost=4563.72..4567.82 rows=328 width=13) (actual time=48.362..50.644 rows=13357 loops=1)
Group Key: web_returns.wr_returning_customer_sk, customer_address_1.ca_state
-> Nested Loop (cost=0.58..4561.26 rows=328 width=13) (actual time=4.310..41.416 rows=13517 loops=1)
-> Nested Loop (cost=0.29..4445.05 rows=343 width=14) (actual time=4.304..18.758 rows=13862 loops=1)
-> Seq Scan on date_dim (cost=0.00..2318.11 rows=365 width=4) (actual time=4.294..8.421 rows=365 loops=1)
Filter: (d_year = 2002)
Rows Removed by Filter: 72684
-> Index Scan using idx_wr_returned_date_sk on web_returns (cost=0.29..5.51 rows=32 width=18) (actual time=0.002..0.019 rows=38 loops=365)
Index Cond: (wr_returned_date_sk = date_dim.d_date_sk)
-> Index Scan using customer_address_pkey on customer_address customer_address_1 (cost=0.29..0.33 rows=1 width=7) (actual time=0.001..0.001 rows=1 loops=13862)
Index Cond: (ca_address_sk = web_returns.wr_returning_addr_sk)
-> Sort (cost=2725.28..2725.29 rows=3 width=253) (actual time=16112.712..16112.717 rows=92 loops=1)
Sort Key: customer.c_customer_id, customer.c_salutation, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_day, customer.c_birth_month, customer.c_birth_year, customer.c_birth_country, customer.c_login, customer.c_email_address, customer.c_last_review_date, ctr1.ctr_total_return
Sort Method: quicksort Memory: 48kB
-> Nested Loop (cost=0.58..2725.25 rows=3 width=253) (actual time=126.693..16112.312 rows=92 loops=1)
-> Nested Loop (cost=0.29..2688.63 rows=109 width=257) (actual time=57.825..16100.184 rows=3245 loops=1)
-> CTE Scan on customer_total_return ctr1 (cost=0.00..2434.58 rows=109 width=36) (actual time=57.816..16077.175 rows=3255 loops=1)
Filter: (ctr_total_return > (SubPlan 2))
Rows Removed by Filter: 10102
SubPlan 2
-> Aggregate (cost=7.39..7.40 rows=1 width=32) (actual time=1.199..1.199 rows=1 loops=13357)
-> CTE Scan on customer_total_return ctr2 (cost=0.00..7.38 rows=2 width=32) (actual time=0.031..1.144 rows=383 loops=13357)
Filter: (ctr1.ctr_state = ctr_state)
Rows Removed by Filter: 12974
-> Index Scan using customer_pkey on customer (cost=0.29..2.32 rows=1 width=229) (actual time=0.006..0.006 rows=1 loops=3255)
Index Cond: (c_customer_sk = ctr1.ctr_customer_sk)
-> Index Scan using customer_address_pkey on customer_address (cost=0.29..0.33 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=3245)
Index Cond: (ca_address_sk = customer.c_current_addr_sk)
Filter: (ca_state = 'IL'::bpchar)
Rows Removed by Filter: 1
Planning time: 3.093 ms
Execution time: 16112.860 ms
(34 rows)
EXPLAIN RESULT with enable_nestloop=off;
http://explain.depesz.com/s/KlR
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=14098.69..14098.70 rows=3 width=253) (actual time=16505.885..16505.895 rows=92 loops=1)
CTE customer_total_return
-> HashAggregate (cost=6236.00..6240.10 rows=328 width=13) (actual time=276.027..278.330 rows=13357 loops=1)
Group Key: web_returns.wr_returning_customer_sk, customer_address_1.ca_state
-> Hash Join (cost=4455.76..6233.54 rows=328 width=13) (actual time=246.246..268.881 rows=13517 loops=1)
Hash Cond: (customer_address_1.ca_address_sk = web_returns.wr_returning_addr_sk)
-> Seq Scan on customer_address customer_address_1 (cost=0.00..1587.00 rows=50000 width=7) (actual time=0.002..4.910 rows=50000 loops=1)
-> Hash (cost=4451.47..4451.47 rows=343 width=14) (actual time=246.228..246.228 rows=13517 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 632kB
-> Merge Join (cost=1426.00..4451.47 rows=343 width=14) (actual time=98.441..244.338 rows=13862 loops=1)
Merge Cond: (web_returns.wr_returned_date_sk = date_dim.d_date_sk)
-> Index Scan using idx_wr_returned_date_sk on web_returns (cost=0.29..2755.33 rows=71763 width=18) (actual time=0.004..92.517 rows=59301 loops=1)
-> Index Scan using date_dim_pkey on date_dim (cost=0.29..2905.95 rows=365 width=4) (actual time=7.100..142.563 rows=13837 loops=1)
Filter: (d_year = 2002)
Rows Removed by Filter: 958434
-> Sort (cost=7858.59..7858.60 rows=3 width=253) (actual time=16505.883..16505.888 rows=92 loops=1)
Sort Key: customer.c_customer_id, customer.c_salutation, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_day, customer.c_birth_month, customer.c_birth_year, customer.c_birth_country, customer.c_login, customer.c_email_address, customer.c_last_review_date, ctr1.ctr_total_return
Sort Method: quicksort Memory: 48kB
-> Hash Join (cost=3425.75..7858.57 rows=3 width=253) (actual time=16482.304..16505.567 rows=92 loops=1)
Hash Cond: (customer.c_customer_sk = ctr1.ctr_customer_sk)
-> Hash Join (cost=989.81..5370.73 rows=3192 width=225) (actual time=1.643..25.147 rows=3200 loops=1)
Hash Cond: (customer.c_current_addr_sk = customer_address.ca_address_sk)
-> Seq Scan on customer (cost=0.00..3849.00 rows=100000 width=229) (actual time=0.005..9.459 rows=100000 loops=1)
-> Hash (cost=969.86..969.86 rows=1596 width=4) (actual time=1.620..1.620 rows=1596 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 57kB
-> Bitmap Heap Scan on customer_address (cost=21.58..969.86 rows=1596 width=4) (actual time=0.280..1.424 rows=1596 loops=1)
Recheck Cond: (ca_state = 'IL'::bpchar)
Heap Blocks: exact=831
-> Bitmap Index Scan on idx_customer_address_1 (cost=0.00..21.18 rows=1596 width=0) (actual time=0.191..0.191 rows=1596 loops=1)
Index Cond: (ca_state = 'IL'::bpchar)
-> Hash (cost=2434.58..2434.58 rows=109 width=36) (actual time=16479.695..16479.695 rows=3245 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 140kB
-> CTE Scan on customer_total_return ctr1 (cost=0.00..2434.58 rows=109 width=36) (actual time=285.509..16478.614 rows=3255 loops=1)
Filter: (ctr_total_return > (SubPlan 2))
Rows Removed by Filter: 10102
SubPlan 2
-> Aggregate (cost=7.39..7.40 rows=1 width=32) (actual time=1.212..1.212 rows=1 loops=13357)
-> CTE Scan on customer_total_return ctr2 (cost=0.00..7.38 rows=2 width=32) (actual time=0.031..1.157 rows=383 loops=13357)
Filter: (ctr1.ctr_state = ctr_state)
Rows Removed by Filter: 12974
Planning time: 8.813 ms
Execution time: 16506.079 ms
(42 rows)
I created indexes on all necessary fields but nothing seems to work. With just 1GB of data, I am expecting this to take less than 5 sec. For 100GB, it is currently taking 2 days!
I have set work_mem=1000MB
. I am pretty much new to interpreting query plans and would like to know how to improve the execution time. Thanks in advance!