Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

How would I make this query run faster in postgres [closed]

$
0
0

Query:

with customer_total_return as
 (select wr_returning_customer_sk as ctr_customer_sk
        ,ca_state as ctr_state,
        sum(wr_return_amt) as ctr_total_return
 from web_returns
     ,date_dim
     ,customer_address
 where wr_returned_date_sk = d_date_sk
   and d_year =2002
   and wr_returning_addr_sk = ca_address_sk
 group by wr_returning_customer_sk
         ,ca_state)
  select  c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
       ,c_last_review_date,ctr_total_return
 from customer_total_return ctr1
     ,customer_address
     ,customer
 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
                          from customer_total_return ctr2
                          where ctr1.ctr_state = ctr2.ctr_state)
       and ca_address_sk = c_current_addr_sk
       and ca_state = 'IL'
       and ctr1.ctr_customer_sk = c_customer_sk
 order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                  ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                  ,c_last_review_date,ctr_total_return
limit 100;

I have indexes created on :

wr_returning_customer_sk
wr_returned_date_sk
d_date_sk
ca_address_sk
wr_returning_addr_sk
ca_address_sk
ca_state, ca_country
c_current_addr_sk
c_customer_sk

EXPLAIN RESULT with enable_nestloop=on;

http://explain.depesz.com/s/F0d

    QUERY PLAN                                                                                                                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7293.09..7293.10 rows=3 width=253) (actual time=16112.713..16112.723 rows=92 loops=1)
   CTE customer_total_return
     ->  HashAggregate  (cost=4563.72..4567.82 rows=328 width=13) (actual time=48.362..50.644 rows=13357 loops=1)
           Group Key: web_returns.wr_returning_customer_sk, customer_address_1.ca_state
           ->  Nested Loop  (cost=0.58..4561.26 rows=328 width=13) (actual time=4.310..41.416 rows=13517 loops=1)
                 ->  Nested Loop  (cost=0.29..4445.05 rows=343 width=14) (actual time=4.304..18.758 rows=13862 loops=1)
                       ->  Seq Scan on date_dim  (cost=0.00..2318.11 rows=365 width=4) (actual time=4.294..8.421 rows=365 loops=1)
                             Filter: (d_year = 2002)
                             Rows Removed by Filter: 72684
                       ->  Index Scan using idx_wr_returned_date_sk on web_returns  (cost=0.29..5.51 rows=32 width=18) (actual time=0.002..0.019 rows=38 loops=365)
                             Index Cond: (wr_returned_date_sk = date_dim.d_date_sk)
                 ->  Index Scan using customer_address_pkey on customer_address customer_address_1  (cost=0.29..0.33 rows=1 width=7) (actual time=0.001..0.001 rows=1 loops=13862)
                       Index Cond: (ca_address_sk = web_returns.wr_returning_addr_sk)
   ->  Sort  (cost=2725.28..2725.29 rows=3 width=253) (actual time=16112.712..16112.717 rows=92 loops=1)
         Sort Key: customer.c_customer_id, customer.c_salutation, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_day, customer.c_birth_month, customer.c_birth_year, customer.c_birth_country, customer.c_login, customer.c_email_address, customer.c_last_review_date, ctr1.ctr_total_return
         Sort Method: quicksort  Memory: 48kB
         ->  Nested Loop  (cost=0.58..2725.25 rows=3 width=253) (actual time=126.693..16112.312 rows=92 loops=1)
               ->  Nested Loop  (cost=0.29..2688.63 rows=109 width=257) (actual time=57.825..16100.184 rows=3245 loops=1)
                     ->  CTE Scan on customer_total_return ctr1  (cost=0.00..2434.58 rows=109 width=36) (actual time=57.816..16077.175 rows=3255 loops=1)
                           Filter: (ctr_total_return > (SubPlan 2))
                           Rows Removed by Filter: 10102
                           SubPlan 2
                             ->  Aggregate  (cost=7.39..7.40 rows=1 width=32) (actual time=1.199..1.199 rows=1 loops=13357)
                                   ->  CTE Scan on customer_total_return ctr2  (cost=0.00..7.38 rows=2 width=32) (actual time=0.031..1.144 rows=383 loops=13357)
                                         Filter: (ctr1.ctr_state = ctr_state)
                                         Rows Removed by Filter: 12974
                     ->  Index Scan using customer_pkey on customer  (cost=0.29..2.32 rows=1 width=229) (actual time=0.006..0.006 rows=1 loops=3255)
                           Index Cond: (c_customer_sk = ctr1.ctr_customer_sk)
               ->  Index Scan using customer_address_pkey on customer_address  (cost=0.29..0.33 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=3245)
                     Index Cond: (ca_address_sk = customer.c_current_addr_sk)
                     Filter: (ca_state = 'IL'::bpchar)
                     Rows Removed by Filter: 1
 Planning time: 3.093 ms
 Execution time: 16112.860 ms
(34 rows)

EXPLAIN RESULT with enable_nestloop=off;

http://explain.depesz.com/s/KlR

    QUERY PLAN                                                                                                                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14098.69..14098.70 rows=3 width=253) (actual time=16505.885..16505.895 rows=92 loops=1)
   CTE customer_total_return
     ->  HashAggregate  (cost=6236.00..6240.10 rows=328 width=13) (actual time=276.027..278.330 rows=13357 loops=1)
           Group Key: web_returns.wr_returning_customer_sk, customer_address_1.ca_state
           ->  Hash Join  (cost=4455.76..6233.54 rows=328 width=13) (actual time=246.246..268.881 rows=13517 loops=1)
                 Hash Cond: (customer_address_1.ca_address_sk = web_returns.wr_returning_addr_sk)
                 ->  Seq Scan on customer_address customer_address_1  (cost=0.00..1587.00 rows=50000 width=7) (actual time=0.002..4.910 rows=50000 loops=1)
                 ->  Hash  (cost=4451.47..4451.47 rows=343 width=14) (actual time=246.228..246.228 rows=13517 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 632kB
                       ->  Merge Join  (cost=1426.00..4451.47 rows=343 width=14) (actual time=98.441..244.338 rows=13862 loops=1)
                             Merge Cond: (web_returns.wr_returned_date_sk = date_dim.d_date_sk)
                             ->  Index Scan using idx_wr_returned_date_sk on web_returns  (cost=0.29..2755.33 rows=71763 width=18) (actual time=0.004..92.517 rows=59301 loops=1)
                             ->  Index Scan using date_dim_pkey on date_dim  (cost=0.29..2905.95 rows=365 width=4) (actual time=7.100..142.563 rows=13837 loops=1)
                                   Filter: (d_year = 2002)
                                   Rows Removed by Filter: 958434
   ->  Sort  (cost=7858.59..7858.60 rows=3 width=253) (actual time=16505.883..16505.888 rows=92 loops=1)
         Sort Key: customer.c_customer_id, customer.c_salutation, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_day, customer.c_birth_month, customer.c_birth_year, customer.c_birth_country, customer.c_login, customer.c_email_address, customer.c_last_review_date, ctr1.ctr_total_return
         Sort Method: quicksort  Memory: 48kB
         ->  Hash Join  (cost=3425.75..7858.57 rows=3 width=253) (actual time=16482.304..16505.567 rows=92 loops=1)
               Hash Cond: (customer.c_customer_sk = ctr1.ctr_customer_sk)
               ->  Hash Join  (cost=989.81..5370.73 rows=3192 width=225) (actual time=1.643..25.147 rows=3200 loops=1)
                     Hash Cond: (customer.c_current_addr_sk = customer_address.ca_address_sk)
                     ->  Seq Scan on customer  (cost=0.00..3849.00 rows=100000 width=229) (actual time=0.005..9.459 rows=100000 loops=1)
                     ->  Hash  (cost=969.86..969.86 rows=1596 width=4) (actual time=1.620..1.620 rows=1596 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 57kB
                           ->  Bitmap Heap Scan on customer_address  (cost=21.58..969.86 rows=1596 width=4) (actual time=0.280..1.424 rows=1596 loops=1)
                                 Recheck Cond: (ca_state = 'IL'::bpchar)
                                 Heap Blocks: exact=831
                                 ->  Bitmap Index Scan on idx_customer_address_1  (cost=0.00..21.18 rows=1596 width=0) (actual time=0.191..0.191 rows=1596 loops=1)
                                       Index Cond: (ca_state = 'IL'::bpchar)
               ->  Hash  (cost=2434.58..2434.58 rows=109 width=36) (actual time=16479.695..16479.695 rows=3245 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 140kB
                     ->  CTE Scan on customer_total_return ctr1  (cost=0.00..2434.58 rows=109 width=36) (actual time=285.509..16478.614 rows=3255 loops=1)
                           Filter: (ctr_total_return > (SubPlan 2))
                           Rows Removed by Filter: 10102
                           SubPlan 2
                             ->  Aggregate  (cost=7.39..7.40 rows=1 width=32) (actual time=1.212..1.212 rows=1 loops=13357)
                                   ->  CTE Scan on customer_total_return ctr2  (cost=0.00..7.38 rows=2 width=32) (actual time=0.031..1.157 rows=383 loops=13357)
                                         Filter: (ctr1.ctr_state = ctr_state)
                                         Rows Removed by Filter: 12974
 Planning time: 8.813 ms
 Execution time: 16506.079 ms
(42 rows)

I created indexes on all necessary fields but nothing seems to work. With just 1GB of data, I am expecting this to take less than 5 sec. For 100GB, it is currently taking 2 days!

I have set work_mem=1000MB. I am pretty much new to interpreting query plans and would like to know how to improve the execution time. Thanks in advance!


Viewing all articles
Browse latest Browse all 1138

Trending Articles