How can I improve this nested postgresql query?

I have a mildly complex query that is having rather poor performance:

UPDATE
    web_pages
SET
    state = 'fetching'
WHERE
    web_pages.id = (
        SELECT
            web_pages.id
        FROM
            web_pages
        WHERE
            web_pages.state = 'new'
        AND
            normal_fetch_mode = true
        AND
            web_pages.priority = (
               SELECT
                    min(priority)
                FROM
                    web_pages
                WHERE
                    state = 'new'::dlstate_enum
                AND
                    distance < 1000000
                AND
                    normal_fetch_mode = true
                AND
                    web_pages.ignoreuntiltime < current_timestamp + '5 minutes'::interval
            )
        AND
            web_pages.distance < 1000000
        AND
            web_pages.ignoreuntiltime < current_timestamp + '5 minutes'::interval
        LIMIT 1
    )
AND
    web_pages.state = 'new'
RETURNING
    web_pages.id;

EXPLAIN ANALYZE:

                                                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on web_pages  (cost=2.12..10.14 rows=1 width=798) (actual time=2312.127..2312.127 rows=0 loops=1)
   InitPlan 3 (returns $2)
     ->  Limit  (cost=1.21..1.56 rows=1 width=4) (actual time=2312.118..2312.118 rows=0 loops=1)
           InitPlan 2 (returns $1)
             ->  Result  (cost=0.77..0.78 rows=1 width=0) (actual time=2312.109..2312.110 rows=1 loops=1)
                   InitPlan 1 (returns $0)
                     ->  Limit  (cost=0.43..0.77 rows=1 width=4) (actual time=2312.106..2312.106 rows=0 loops=1)
                           ->  Index Scan using ix_web_pages_distance_filtered on web_pages web_pages_1  (cost=0.43..176587.44 rows=509043 width=4) (actual time=2312.103..2312.103 rows=0 loops=1)
                                 Index Cond: (priority IS NOT NULL)
                                 Filter: (ignoreuntiltime < (now() + '00:05:00'::interval))
           ->  Index Scan using ix_web_pages_distance_filtered on web_pages web_pages_2  (cost=0.43..35375.47 rows=101809 width=4) (actual time=2312.116..2312.116 rows=0 loops=1)
                 Index Cond: (priority = $1)
                 Filter: (ignoreuntiltime < (now() + '00:05:00'::interval))
   ->  Index Scan using ix_web_pages_id on web_pages  (cost=0.56..8.58 rows=1 width=798) (actual time=2312.124..2312.124 rows=0 loops=1)
         Index Cond: (id = $2)
         Filter: (state = 'new'::dlstate_enum)
 Planning time: 1.712 ms
 Execution time: 2313.699 ms
(18 rows)

Table Schema:

                                               Table "public.web_pages"
      Column       |            Type             |                              Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
 id                | integer                     | not null default nextval('web_pages_id_seq'::regclass)
 state             | dlstate_enum                | not null
 errno             | integer                     |
 url               | text                        | not null
 starturl          | text                        | not null
 netloc            | text                        | not null
 file              | integer                     |
 priority          | integer                     | not null
 distance          | integer                     | not null
 is_text           | boolean                     |
 limit_netloc      | boolean                     |
 title             | citext                      |
 mimetype          | text                        |
 type              | itemtype_enum               |
 content           | text                        |
 fetchtime         | timestamp without time zone |
 addtime           | timestamp without time zone |
 tsv_content       | tsvector                    |
 normal_fetch_mode | boolean                     | default true
 ignoreuntiltime   | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone
Indexes:
    "web_pages_pkey" PRIMARY KEY, btree (id)
    "ix_web_pages_url" UNIQUE, btree (url)
    "idx_web_pages_title" gin (to_tsvector('english'::regconfig, title::text))
    "ix_web_pages_distance" btree (distance)
    "ix_web_pages_distance_filtered" btree (priority) WHERE state = 'new'::dlstate_enum AND distance < 1000000 AND normal_fetch_mode = true
    "ix_web_pages_id" btree (id)
    "ix_web_pages_netloc" btree (netloc)
    "ix_web_pages_priority" btree (priority)
    "ix_web_pages_state" btree (state)
    "ix_web_pages_url_ops" btree (url text_pattern_ops)
    "web_pages_state_netloc_idx" btree (state, netloc)
Foreign-key constraints:
    "web_pages_file_fkey" FOREIGN KEY (file) REFERENCES web_files(id)
Triggers:
    update_row_count_trigger BEFORE INSERT OR UPDATE ON web_pages FOR EACH ROW EXECUTE PROCEDURE web_pages_content_update_func()

I’ve experimented with creating compound indexes on multiple columns to try to improve the query performance, without much luck. I have VACUUM ANALYZEd for the above EXPLAIN.

The cardinality of the priority column is quite low, it has about 5 distinct values, and the size of the overall table is fairly large (55,659,673 rows).

Query execution time is rather variable, generally 2 seconds worst-case, 600 milliseconds best case, when the entire index is cached in ram (when the DB isn’t under other loads).

It seems that the major load is the min(priority) subselect, but I haven’t had much luck with creating indices that improve it’s performance, though that may entirely be operator error:

EXPLAIN ANALYZE                
SELECT
    min(priority)
FROM
    web_pages
WHERE
    state = 'new'::dlstate_enum
AND
    distance < 1000000
AND
    normal_fetch_mode = true
AND
    web_pages.ignoreuntiltime < current_timestamp + '5 minutes'::interval;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.77..0.78 rows=1 width=0) (actual time=625.380..625.381 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..0.77 rows=1 width=4) (actual time=625.375..625.375 rows=0 loops=1)
           ->  Index Scan using ix_web_pages_distance_filtered on web_pages  (cost=0.43..176587.44 rows=509043 width=4) (actual time=625.373..625.373 rows=0 loops=1)
                 Index Cond: (priority IS NOT NULL)
                 Filter: (ignoreuntiltime < (now() + '00:05:00'::interval))
 Planning time: 0.475 ms
 Execution time: 625.408 ms
(8 rows)

Are there any easy ways to improve the performance of this query? I’ve thought about maintaining a running count of each sub-value in the column with a append-only count table that’s updated with triggers, but that’s complex and a fair bit of effort, and I want to be sure there isn’t a simpler approach before implementing all that.

How can I improve this nested postgresql query?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List