Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Slow querys on billions-rows-table // index used

$
0
0

Since I’m a young developer and not realy skiled in using databases (PostgreSQL 9.3) i ran into some problems with a project, where i realy need help with.
My project is about collecting data of devices (up to 1000 or more devices), where every device is sending one datablock every second, which makes about 3 million rows per hour.

Currently I’ve got one big table where i store the incoming data of every device.

CREATE TABLE data_block(
    id bigserial
    timestamp timestamp
    mac bigint
)

Because the fact, that there are several types of data a data_block can (or can not) include, there are other tables which reference to the data_block table.

CREATE TABLE dataA(
    data_block_id bigserial
    data

    CONSTRAINT fkey FOREIGN KEY (data_block_id) REFERENCES data_block(id);
);
CREATE TABLE dataB(...);
CREATE TABLE dataC(...);
CREATE INDEX index_dataA_block_id ON dataA (data_block_id DESC);
...

(e.g. it is possible that in one data_block there is 3x dataA, 1x dataB but no dataC)

The data will be kept for some weeks, so i’m gonna have ~5 billion rows in this table. At the Moment there i have ~600 Millions of rows in the table and my querys took realy long.
So i decided to make an index over timestamp and mac, because my select statements always query over time and often also over time+mac.

CREATE INDEX index_ts_mac ON data_block (timestamp DESC, mac);

… but my querys still take ages.
For example i quered data of one day and one mac:

SELECT * FROM data_block WHERE timestamp>'2014-09-15' AND timestamp<'2014-09-17' AND mac=123456789  

Index Scan using index_ts_mac on data_block  (cost=0.57..957307.24 rows=315409 width=32) (actual time=39.849..334534.972 rows=285857 loops=1)
  Index Cond: ((timestamp > '2014-09-14 00:00:00'::timestamp without time zone) AND (timestamp < '2014-09-16 00:00:00'::timestamp without time zone) AND (mac = 123456789))
Total runtime: 334642.078 ms

(did a full vacuum before query-run)

Is there an elegant way, to solve such a problem with big tables to do an query <10sec?
I read about partitioning, but this wont work with my dataA,dataB,dataC references to data_block_id right? If it would work somehow, should i make partitions over time or over mac?

maybe there is a way help me out here.
thank you.


* ..::: Edit :::.. *
Now i have changed my index to the other direction. First MAC, then timestamp, and it gains a lot of performance. Thank you very much.

CREATE INDEX index_mac_ts ON data_block (mac, timestamp DESC);

But still, queries take >30sec. Especially when i do a LEFT JOIN with my data tables.


* ..::: Edit2 :::.. *
Here is an EXPLAIN ANALYZE of the query with the new index. It still takes >30sec. Is there something else (except partitioning) that i can do?

EXPLAIN ANALYZE SELECT * FROM data_block WHERE mac = 123456789 AND timestamp < '2014-10-05 00:00:00' AND timestamp > '2014-10-04 00:00:00'

Bitmap Heap Scan on data_block  (cost=1514.57..89137.07 rows=58667 width=28) (actual time=2420.842..32353.678 rows=51342 loops=1)
  Recheck Cond: ((mac = 123456789) AND (timestamp < '2014-10-05 00:00:00'::timestamp without time zone) AND (timestamp > '2014-10-04 00:00:00'::timestamp without time zone))
  ->  Bitmap Index Scan on index_mac_ts  (cost=0.00..1499.90 rows=58667 width=0) (actual time=2399.291..2399.291 rows=51342 loops=1)
        Index Cond: ((mac = 123456789) AND (timestamp < '2014-10-05 00:00:00'::timestamp without time zone) AND (timestamp > '2014-10-04 00:00:00'::timestamp without time zone))
Total runtime: 32360.620 ms 

* ..::: Edit3 :::.. *
Partitioning over Mac-Address to have more but not very large tables did the trick.
Thank you everyone :)


Unfortunately my hardware is strictly limited. Im using an Intel i3-2100 @3.10Ghz, 4GB RAM
My current settings are as following:

default_statistics_target = 100
maintenance_work_mem = 512MB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 4GB
work_mem = 512MB
wal_buffers = 16MB
checkpoint_segments = 32
shared_buffers = 2GB
max_connections = 20
random_page_cost = 2

Viewing all articles
Browse latest Browse all 1138

Trending Articles