Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

List distinct column values where those rows share other column values

$
0
0

I’m trying to write a query to detect possibly-invalid data in a PostgreSQL table. We have a table of city names like this:

# `city_names`
id    | name     |   language | dialect | city_id
------------------------------------------------
01    | London   |   A        | A1      | 1
02    | London   |   A        | A2      | 1
03    | London   |   B        | B1      | 2
04    | London   |   B        | B2      | 3

In our domain:

  • It’s fine that rows 01 and 02 both map “London” to city 1; the dialects don’t happen to differ
  • It’s fine that row 03 maps “London” to city 2; in that language, the name may refer to a different city
  • It’s suspicious that row 04 maps “London” to city 3, because we already have a mapping to city 2 in the same language

I want to write a query that selects only rows 03 and 04 so that a human can decide whether one of them points to the wrong city.

I can solve this problem procedurally, but I’m having trouble doing it in SQL. For example, if I GROUP BY language and name, I lose the city_id values from the individual rows.

Basically my goal is: “If there’s more than one city_id for the same name and language, list those city_ids.”

How can I do this?


Viewing all articles
Browse latest Browse all 1138

Trending Articles