Why are correlated subqueries sometimes faster than joins in Postgres?

This relates to the dataset described in Postgres is performing sequential scan instead of index scan

I’ve started work on adapting the import logic to work with a more normalised schema – no surprises here it’s faster and more compact – but I’ve hit a roadblock updating the existing data: adding and updating with relevant foreign keys is taking an age.

UPDATE pages
SET id_site = id FROM sites
WHERE sites.url = pages."urlShort"
AND "labelDate" = '2015-01-15'

NB pages.”urlShort” and sites.url are textfields, both are indexed but currently have no explicit relationship.

There are around 500,000 rows for each date value and updates like this are taking around 2h30. :-(

I looked at what the underlying query might look at:

select * from pages
join sites on
sites.url = pages."urlShort"
where "labelDate" = '2015-01-01'

This takes around 6 minutes to run has query plan like this:

"Hash Join  (cost=80226.81..934763.02 rows=493018 width=365)"
"  Hash Cond: ((pages."urlShort")::text = sites.url)"
"  ->  Bitmap Heap Scan on pages  (cost=13549.32..803595.26 rows=493018 width=315)"
"        Recheck Cond: ("labelDate" = '2015-01-01'::date)"
"        ->  Bitmap Index Scan on "pages_labelDate_idx"  (cost=0.00..13426.07 rows=493018 width=0)"
"              Index Cond: ("labelDate" = '2015-01-01'::date)"
"  ->  Hash  (cost=30907.66..30907.66 rows=1606466 width=50)"
"        ->  Seq Scan on sites  (cost=0.00..30907.66 rows=1606466 width=50)"

Based on some help in the past on related subjects I decided to compare this with a similar query that used a correlated subquery instead of a join.

SELECT "urlShort" AS url
FROM pages
WHERE 
"labelDate" = '2015-01-01'
and id_site is NULL
AND EXISTS
(SELECT * FROM sites
     WHERE sites.url = pages."urlShort")

This query only takes about 15s to run and has the following query plan:

"Hash Join  (cost=64524.36..860389.62 rows=423223 width=27)"
"  Hash Cond: ((pages."urlShort")::text = sites.url)"
"  ->  Bitmap Heap Scan on pages  (cost=13535.88..803581.81 rows=423223 width=27)"
"        Recheck Cond: ("labelDate" = '2015-01-01'::date)"
"        Filter: (id_site IS NULL)"
"        ->  Bitmap Index Scan on "pages_labelDate_idx"  (cost=0.00..13430.07 rows=493018 width=0)"
"              Index Cond: ("labelDate" = '2015-01-01'::date)"
"  ->  Hash  (cost=30907.66..30907.66 rows=1606466 width=27)"
"        ->  Seq Scan on sites  (cost=0.00..30907.66 rows=1606466 width=27)"

There are two things I’d like to know:
1) Can adjust the update to run faster based on the above?
2) What parts of the query plan are telltales for running slow? Or do you always have to run EXPLAIN ANALYZE to findout?

Why are correlated subqueries sometimes faster than joins in Postgres?

Trending Articles

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Practice Sheet of Right form of verbs for HSC Students

Muloraki Au

[R.G. Mechanics] Assassin's Creed IV - Black Flag

Property developer set up cannabis factory to help pay off debts...

Young Qualified Chinese Masseuse Erotic or Authentic

Bas Tum Tak Lyrics Translation (Raanjhnaa/ Raanjhanaa/ Raanjhana)

Download: Triple B ft Dy2k – Mwamuna Wanga

Elle Duncan’s Husband Omar Abdul Ali

Moondru Mudichu 07-06-2016 – Polimer tv Serial

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Notts men wanted over alleged cocaine smuggling plot

£2 million worth of cocaine estimated in supply plot by jailed Grantham men

Neem Baba Extra Questions Answer Class 6 English Poorvi

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Boyfriend charged with murder of teen footballer

Walkthrough Pokemon Victory Fire Complete | English Language

Missing girl, Jordyn Lyons, 13

Camila Ballon Arrested by Miami-Dade County Corrections on May 06, 2020

The 6 Best Sex Scenes in Nollywood Movies