I have a table that keeps measurements of latencies between nodes running MPI tasks in a large cluster. The table looks like this:
CREATE TABLE latency(
from_rank int,
to_rank int,
from_host varchar(20),
to_host varchar(20),
from_cpu varchar(20),
to_cpu varchar(20),
latency float8);
CREATE INDEX ON latency(from_host, to_host);
Now after a large experiment I collected over 500 million rows of data. I find querying these data painfully slow, below is an example of a SELECT COUNT(*)
psql (9.4devel)
Type "help" for help.
routing=# timing
Timing is on.
routing=# SELECT COUNT(*) FROM latency;
count
-----------
522190848
(1 row)
Time: 759462.969 ms
routing=# SELECT COUNT(*) FROM latency;
count
-----------
522190848
(1 row)
Time: 96775.036 ms
routing=# SELECT COUNT(*) FROM latency;
count
-----------
522190848
(1 row)
Time: 97708.132 ms
routing=#
I am running both the PgSQL server and client on the same machine, which has 4 Xeon E7-4870s (40 cores/80 threads in total) and 1 TB of RAM. The effect of Linux file caching is obvious: the first query took well over 12mins while the subsequent ones took about 1.5min.
Is there anything I can do to make the query run faster, since 1.5min isn’t exactly responsive.
Thanks.