Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

What indexes should one use to optimize a PostgreSQL query with a JOIN depth of 2?

$
0
0

Disclaimer: I am relatively new to PostgreSQL.

I’m wondering how to optimize a query that does 2 INNER JOINs. My scenario is fairly simple:

Select Posts with a photo (Posts.photo IS NOT NULL) and a Hashtag that has the name ‘dead’ (Hashtags.name = 'dead').

The associations are as follows:

Posts <- PostHashtags -> Hashtags

Posts.id    = PostHashtags.postId (FK)
Hashtags.id = PostHashtags.hashtagId (FK)

Here is the query:

SELECT
  "Posts".*,
  "hashtags"."id" AS "hashtags.id",
  "hashtags"."count" AS "hashtags.count",
  "hashtags"."name" AS "hashtags.name",
  "hashtags"."createdAt" AS "hashtags.createdAt",
  "hashtags"."updatedAt" AS "hashtags.updatedAt",
  "hashtags"."objectId" AS "hashtags.objectId",
  "hashtags"."_etl" AS "hashtags._etl",
  "hashtags.PostHashtag"."id" AS "hashtags.PostHashtag.id",
  "hashtags.PostHashtag"."createdAt" AS "hashtags.PostHashtag.createdAt",
  "hashtags.PostHashtag"."updatedAt" AS "hashtags.PostHashtag.updatedAt",
  "hashtags.PostHashtag"."postId" AS "hashtags.PostHashtag.postId",
  "hashtags.PostHashtag"."hashtagId" AS "hashtags.PostHashtag.hashtagId",
  "hashtags.PostHashtag"."objectId" AS "hashtags.PostHashtag.objectId",
  "hashtags.PostHashtag"."_etl" AS "hashtags.PostHashtag._etl"

FROM (
  SELECT
    "Posts"."id",
    "Posts"."note",
    "Posts"."photo",
    "Posts"."createdAt",
    "user"."id" AS "user.id",
    "user"."name" AS "user.name"
  FROM "Posts" AS "Posts"

  INNER JOIN "Users" AS "user" ON "Posts"."userId" = "user"."id"

  WHERE "Posts"."photo" IS NOT NULL
  AND (
    SELECT "PostHashtags"."id" FROM "PostHashtags" AS "PostHashtags"
    INNER JOIN "Hashtags" AS "Hashtag" ON "PostHashtags"."hashtagId" = "Hashtag"."id"
    WHERE "Posts"."id" = "PostHashtags"."postId"
    LIMIT 1
  ) IS NOT NULL

  ORDER BY "Posts"."createdAt" DESC LIMIT 10
) AS "Posts"

INNER JOIN (
  "PostHashtags" AS "hashtags.PostHashtag"
  INNER JOIN "Hashtags" AS "hashtags" ON "hashtags"."id" = "hashtags.PostHashtag"."hashtagId"
)

ON "Posts"."id" = "hashtags.PostHashtag"."postId"
AND "hashtags"."name" = 'dead'

ORDER BY "Posts"."createdAt" DESC;

EXPLAIN results:

Nested Loop  (cost=886222912.89..886223769.55 rows=1 width=277)
  Join Filter: ("hashtags.PostHashtag"."postId" = "Posts".id)
  ->  Limit  (cost=886220835.39..886220835.42 rows=10 width=189)
        ->  Sort  (cost=886220835.39..886220988.88 rows=61394 width=189)
              Sort Key: "Posts"."createdAt"
              ->  Nested Loop  (cost=0.42..886219508.69 rows=61394 width=189)
                    ->  Seq Scan on "Posts"  (cost=0.00..885867917.51 rows=78196 width=177)
                          Filter: ((photo IS NOT NULL) AND ((SubPlan 1) IS NOT NULL))
                          SubPlan 1
                            ->  Limit  (cost=0.42..815.70 rows=1 width=4)
                                  ->  Nested Loop  (cost=0.42..815.70 rows=1 width=4)
                                        ->  Seq Scan on "PostHashtags"  (cost=0.00..811.25 rows=1 width=8)
                                              Filter: ("Posts".id = "postId")
                                        ->  Index Only Scan using "Hashtags_pkey" on "Hashtags" "Hashtag"  (cost=0.42..4.44 rows=1 width=4)
                                              Index Cond: (id = "PostHashtags"."hashtagId")
                    ->  Index Scan using "Users_pkey" on "Users" "user"  (cost=0.42..4.49 rows=1 width=16)
                          Index Cond: (id = "Posts"."userId")
  ->  Materialize  (cost=2077.50..2933.89 rows=1 width=88)
        ->  Hash Join  (cost=2077.50..2933.89 rows=1 width=88)
              Hash Cond: ("hashtags.PostHashtag"."hashtagId" = hashtags.id)
              ->  Seq Scan on "PostHashtags" "hashtags.PostHashtag"  (cost=0.00..721.00 rows=36100 width=40)
              ->  Hash  (cost=2077.49..2077.49 rows=1 width=48)
                    ->  Seq Scan on "Hashtags" hashtags  (cost=0.00..2077.49 rows=1 width=48)
                          Filter: ((name)::text = 'dead'::text)

This query has been simplified slightly. It also performs OUTER JOINS on other data related to Posts, which is why the SELECT must be performed on Posts instead of, say, PostHashtags.

Any help in translating the EXPLAIN to a useful index would be greatly appreciated.

My ideas:

  1. Build an index on Posts.photo, but should it be a partial index WHERE "photo" IS NOT NULL?
  2. Build a UNIQUE index on Hashtags.name.

I’m not sure if those are necessarily the bottlenecks, though.


Viewing all articles
Browse latest Browse all 1138

Trending Articles