Disclaimer: I am relatively new to PostgreSQL.
I’m wondering how to optimize a query that does 2 INNER JOIN
s. My scenario is fairly simple:
Select Posts with a photo (Posts.photo IS NOT NULL
) and a Hashtag that has the name ‘dead’ (Hashtags.name = 'dead'
).
The associations are as follows:
Posts <- PostHashtags -> Hashtags
Posts.id = PostHashtags.postId (FK)
Hashtags.id = PostHashtags.hashtagId (FK)
Here is the query:
SELECT
"Posts".*,
"hashtags"."id" AS "hashtags.id",
"hashtags"."count" AS "hashtags.count",
"hashtags"."name" AS "hashtags.name",
"hashtags"."createdAt" AS "hashtags.createdAt",
"hashtags"."updatedAt" AS "hashtags.updatedAt",
"hashtags"."objectId" AS "hashtags.objectId",
"hashtags"."_etl" AS "hashtags._etl",
"hashtags.PostHashtag"."id" AS "hashtags.PostHashtag.id",
"hashtags.PostHashtag"."createdAt" AS "hashtags.PostHashtag.createdAt",
"hashtags.PostHashtag"."updatedAt" AS "hashtags.PostHashtag.updatedAt",
"hashtags.PostHashtag"."postId" AS "hashtags.PostHashtag.postId",
"hashtags.PostHashtag"."hashtagId" AS "hashtags.PostHashtag.hashtagId",
"hashtags.PostHashtag"."objectId" AS "hashtags.PostHashtag.objectId",
"hashtags.PostHashtag"."_etl" AS "hashtags.PostHashtag._etl"
FROM (
SELECT
"Posts"."id",
"Posts"."note",
"Posts"."photo",
"Posts"."createdAt",
"user"."id" AS "user.id",
"user"."name" AS "user.name"
FROM "Posts" AS "Posts"
INNER JOIN "Users" AS "user" ON "Posts"."userId" = "user"."id"
WHERE "Posts"."photo" IS NOT NULL
AND (
SELECT "PostHashtags"."id" FROM "PostHashtags" AS "PostHashtags"
INNER JOIN "Hashtags" AS "Hashtag" ON "PostHashtags"."hashtagId" = "Hashtag"."id"
WHERE "Posts"."id" = "PostHashtags"."postId"
LIMIT 1
) IS NOT NULL
ORDER BY "Posts"."createdAt" DESC LIMIT 10
) AS "Posts"
INNER JOIN (
"PostHashtags" AS "hashtags.PostHashtag"
INNER JOIN "Hashtags" AS "hashtags" ON "hashtags"."id" = "hashtags.PostHashtag"."hashtagId"
)
ON "Posts"."id" = "hashtags.PostHashtag"."postId"
AND "hashtags"."name" = 'dead'
ORDER BY "Posts"."createdAt" DESC;
EXPLAIN results:
Nested Loop (cost=886222912.89..886223769.55 rows=1 width=277)
Join Filter: ("hashtags.PostHashtag"."postId" = "Posts".id)
-> Limit (cost=886220835.39..886220835.42 rows=10 width=189)
-> Sort (cost=886220835.39..886220988.88 rows=61394 width=189)
Sort Key: "Posts"."createdAt"
-> Nested Loop (cost=0.42..886219508.69 rows=61394 width=189)
-> Seq Scan on "Posts" (cost=0.00..885867917.51 rows=78196 width=177)
Filter: ((photo IS NOT NULL) AND ((SubPlan 1) IS NOT NULL))
SubPlan 1
-> Limit (cost=0.42..815.70 rows=1 width=4)
-> Nested Loop (cost=0.42..815.70 rows=1 width=4)
-> Seq Scan on "PostHashtags" (cost=0.00..811.25 rows=1 width=8)
Filter: ("Posts".id = "postId")
-> Index Only Scan using "Hashtags_pkey" on "Hashtags" "Hashtag" (cost=0.42..4.44 rows=1 width=4)
Index Cond: (id = "PostHashtags"."hashtagId")
-> Index Scan using "Users_pkey" on "Users" "user" (cost=0.42..4.49 rows=1 width=16)
Index Cond: (id = "Posts"."userId")
-> Materialize (cost=2077.50..2933.89 rows=1 width=88)
-> Hash Join (cost=2077.50..2933.89 rows=1 width=88)
Hash Cond: ("hashtags.PostHashtag"."hashtagId" = hashtags.id)
-> Seq Scan on "PostHashtags" "hashtags.PostHashtag" (cost=0.00..721.00 rows=36100 width=40)
-> Hash (cost=2077.49..2077.49 rows=1 width=48)
-> Seq Scan on "Hashtags" hashtags (cost=0.00..2077.49 rows=1 width=48)
Filter: ((name)::text = 'dead'::text)
This query has been simplified slightly. It also performs OUTER JOINS
on other data related to Posts
, which is why the SELECT
must be performed on Posts
instead of, say, PostHashtags
.
Any help in translating the EXPLAIN
to a useful index would be greatly appreciated.
My ideas:
- Build an index on
Posts.photo
, but should it be a partial indexWHERE "photo" IS NOT NULL
? - Build a
UNIQUE
index onHashtags.name
.
I’m not sure if those are necessarily the bottlenecks, though.