Sort by number of related rows in referencing table

Let there be two tables:

Users

id [pk] |   name
--------+---------
      1 | Alice
      2 | Bob
      3 | Charlie
      4 | Dan

Emails

 id | user_id | email 
----+---------+-------
  1 |       1 | a.1
  2 |       1 | a.2
  3 |       2 | a.3
  4 |       2 | b.1
  5 |       2 | a.4
  6 |       2 | a.5
  7 |       3 | b.2
  8 |       3 | a.6

I want to retrieve next data:

user’s id and name
count of user’s emails
user’s email and its id

I’d like the output to be ordered descending by number of emails and filtered including only emails starting with ‘a’.

I’d like to retrieve it within one query.

There is my query:

SELECT users.id AS user_id, users.name AS name,
       emails.id AS email_id, emails.email AS email,
       count(emails.id) OVER (PARTITION BY users.id) as n_emails
FROM users
LEFT JOIN emails on users.id = emails.user_id
WHERE emails.email LIKE 'a' || '%%'
ORDER BY n_emails DESC;

And the (expected) result, it looks good:

 user_id |  name   | email_id | email | n_emails 
---------+---------+----------+-------+----------
       2 | Bob     |        6 | a.5   |        3
       2 | Bob     |        5 | a.4   |        3
       2 | Bob     |        3 | a.3   |        3
       1 | Alice   |        2 | a.2   |        2
       1 | Alice   |        1 | a.1   |        2
       3 | Charlie |        8 | a.6   |        1

It’s obvious that this is a simple and small example while the actual dataset could be large enough, so I’d like to use LIMIT/OFFSET for paging. For example, I’d like to fetch a first pair of users (not just rows):

-- previous query ...
LIMIT 2 OFFSET 0;

And… fail. I’ve got incomplete information about Bob only:

 user_id | name | email_id | email | n_emails 
---------+------+----------+-------+----------
       2 | Bob  |        6 | a.5   |        3
       2 | Bob  |        5 | a.4   |        3

Hence the question: how can I apply limit/offset to objects, in this case, users (logical entities, not rows)?

I’ve found such solution: add dense_rank() over users.id and then filter by rank:

SELECT * FROM (
    SELECT users.id AS user_id, users.name AS name,
           emails.id AS email_id, emails.email AS email,
           count(emails.id) OVER (PARTITION BY users.id) as n_emails,
           dense_rank() OVER (ORDER BY users.id) as n_user
    FROM users
    LEFT JOIN emails on users.id = emails.user_id
    WHERE emails.email LIKE 'a' || '%%'
    ORDER BY n_emails DESC
    ) AS sq
WHERE sq.n_user <= 2; -- here it is

The output looks good:

 user_id | name  | email_id | email | n_emails | n_user 
---------+-------+----------+-------+----------+--------
       2 | Bob   |        6 | a.5   |        3 |      2
       2 | Bob   |        5 | a.4   |        3 |      2
       2 | Bob   |        3 | a.3   |        3 |      2
       1 | Alice |        2 | a.2   |        2 |      1
       1 | Alice |        1 | a.1   |        2 |      1

But if you look at query plan, you’ll see that the most expensive steps are subquery scan and sorting. AFAIK it is impossible to build index on subquery or CTE, so it will be always sequence scan/filter over n_user and query will execute for a long time on big dataset.

Another solution I see to make two queries:

retrieve only user ids and number of emails for filtered and sorted dataset using subquery;
join first subquery with users and emails

The query is:

SELECT users.id AS user_id, users.name,
       emails.id AS email_id, emails.email,
       sq.n_emails
FROM
(SELECT users.id, count(emails.id) AS n_emails
    FROM users
    LEFT JOIN emails ON users.id = emails.user_id
    WHERE emails.email LIKE 'a' || '%%'
    GROUP BY users.id
    ORDER BY n_emails DESC
    LIMIT 2 OFFSET 0 -- here it is
    ) AS sq
JOIN users ON users.id = sq.id
LEFT JOIN emails ON emails.user_id = users.id
WHERE emails.email LIKE 'a' || '%%'
ORDER BY sq.n_emails DESC;

This seems to be much faster. But it doesn’t look like good solution because I have to duplicate the exactly same query (except SELECT...FROM part), in fact, one query runs two times. Is there any better solution?

Sort by number of related rows in referencing table

Trending Articles

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Download: Kaliman ft Mickail – Ndimwaiche

Drama series, Shaka Ilembe release date set for 2023

Named and shamed: a round up of cases heard by Essex magistrates

Bureau of Internal Revenue: Regional Offices (Directory)

Keating on Behalf of the Wallara People, Clan of the Koko-Muluridji v State...

Gangster Health Plan Never Kicked In As Promised For Cadillac Frank, Led To...

DIVO ADARNI

Parole Hearing Alert: Angela Dawn Fowler

Who's been in the courts?

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

Renolink 1.99 China without error "Padding is invalid..."

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

२०१६ मराठी कालनिर्णय दिनदर्शिका डाउनलोड

Victims of Father Gannon defeated a cover-up, with help from Broken Rites

The 6 Best Sex Scenes in Nollywood Movies

Practice Sheet of Right form of verbs for HSC Students

Aquiles Torrealba Alverez Arrested by Miami-Dade County Corrections on May...

How to retrive an eigenvector connected to its eigenvalue