Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Managing and speeding up queries on PostgreSQL table with over 3 trillion rows

$
0
0

I have time series data which spans over 10 years and has over 3 trillion rows and 10 columns.

At the moment I use a PCIe SSD with 128GB of RAM and I am finding that querying takes a significant amount of time.
For example running the below command takes well over 15 mins:

SELECT * FROM table WHERE column_a = 'value1' AND column_b = 'value2';

The table is mostly used for reads. The only time the table is written to is during weekly updates which insert about 15 million rows.

What are the best way to manage tables so large? Would you recommend splitting it by year?

The table size is 542 GB and the external size is 109 GB.

EXPLAIN (BUFFERS, ANALYZE) output:

"Seq Scan on table  (cost=0.00..116820941.44 rows=758 width=92) (actual time=0.011..1100643.844 rows=667 loops=1)"
"  Filter: (("COLUMN_A" = 'Value1'::text) AND ("COLUMN_B" = 'Value2'::text))"
"  Rows Removed by Filter: 4121893840"
"  Buffers: shared hit=2 read=56640470 dirtied=476248 written=476216"
"Total runtime: 1100643.967 ms"

The table was created using the following code:

    CREATE TABLE table_name
(
  "DATE" timestamp with time zone,
  "COLUMN_A" text,
  "COLUMN_B" text,
  "VALUE_1" double precision,
  "VALUE_2" double precision,
  "VALUE_3" double precision,
  "VALUE_4" double precision,
  "VALUE_5" double precision,
  "VALUE_6" double precision,
  "VALUE_7" double precision,
)
WITH (
  OIDS=FALSE
);
ALTER TABLE table_name
  OWNER TO user_1;

CREATE INDEX "ix_table_name_DATE"
  ON table_name
  USING btree
  ("DATE");

Viewing all articles
Browse latest Browse all 1138

Trending Articles