Syntax error in postgresql query [closed]

August 11, 2015, 2:00 pm

≫ Next: Accurate PG database size for comparing representations

≪ Previous: Locking in Postgres for UPDATE / INSERT combination

I am trying to migrate a sql query into PostgreSQL. After running the query I get this error:

ERROR: syntax error at or near "("
LINE 20: ..._date is null or fcm.effect_end_date>=current_date()) limit ...

Below is my query:-

select farechart_master_id,farechart_name,version_number_service_stype,
fcm.route_id,st.service_type_name,fcm.passenger_type_id,
fcm.effect_start_date,fcm.effect_end_date,fcm.nignt_service,
fcm.peak_time,fcm.flexi_fare,r.route_number,r.route_direction,
r.effective_from,r.effective_till from farechart_master fcm 
left join rate_master rm on rm.rate_master_id=fcm.rate_master_id
left join route r on r.route_id=fcm.route_id 
left join service_type st on st.service_type_id=fcm.service_type_id 
where fcm.deleted_status=0 
and (fcm.effect_end_date is null or fcm.effect_end_date>=current_date())
limit 0 offset 10

Where am I going wrong?

↧

Accurate PG database size for comparing representations

August 11, 2015, 2:00 pm

≫ Next: Migrating PostgreSQL files to new server

≪ Previous: Syntax error in postgresql query [closed]

I have a Java/JDBC program that takes a sample file and imports the data into the database, shredding it across multiple relations. The program does this multiple times for several different representations, one of which uses Large Objects. I can share more details of these representations but they are quite long and aren’t relevant to this question since I’m looking for something generic.

I would like to compare the sizes of these different representations by examining the size of the database after each import. The database is on a PostgreSQL 9.4 local Windows server instance, with no other users and default configuration. Its only purpose is to conduct this test.

My initial plan was as follows:

for each representation {
  call VACUUM ANALYZE
  record old DB size with SELECT pg_tablespace_size('pg_default');
  import data into database
  call VACUUM ANALYZE
  record new DB size with SELECT pg_tablespace_size('pg_default');
  store storage cost as new DB size - old DB size
}

Obviously there are limitations to this approach, but my expectation is that for large files (~100MB) the reported storage costs should be reasonable approximations. Note that I use pg_tablespace_size in order to include the contribution of data outside of the main schema, such as large objects (in pg_catalog.pg_largeobject and pg_catalog.pg_largeobject_metadata).

I’m wondering whether this is a correct approach, and whether there is a better approach. I’m unsure whether VACUUM ANALYZE properly updates the stats used by pg_tablespace_size, even though it is called in the same session. It would also be better if I could avoid calling VACUUM ANALYZE, since this requires connecting as the superuser in order to run on the pg_catalog relations.

Any thoughts?

↧

Migrating PostgreSQL files to new server

August 11, 2015, 2:00 pm

≫ Next: How to reinstall PostgreSQL over an existing installation

≪ Previous: Accurate PG database size for comparing representations

We have a Win 2008 server that went down because of cpu failure. We were running parts of the Atlassian stack (JIRA, Confluence, Stash, and Bamboo) on it using PostgreSQL as the database. I now have the harddrive from that dead server mounted in to a Win 7 machine as a secondary harddrive. We unfortunately did not have backups setup. All of the instructions I see for migrating databases to new machines involve running PostgreSQL cli programs against a running PostgreSQL instance, but in my case all I have is a folder not a running PostgreSQL server. I would like to be able to install PostgreSQL on the Win 7 machine, somehow get the data migrated over to the new PostgreSQL installation and off of that secondary drive. Then I would like to export the data out using the standard procedures and move it to a new production server or possibly export using the Atlassian backup tools if we end up moving the data in to the Atlassian cloud service where they use who knows what db engine. Either way though I need the data off the olddrive and running in a new PostgreSQL server to move forward. Anyone have any ideas?

↧

How to reinstall PostgreSQL over an existing installation

August 11, 2015, 2:00 pm

≫ Next: Retrieve additional columns in recursive CTE

≪ Previous: Migrating PostgreSQL files to new server

I am trying to upgrade an installer created with NSIS that installs postgresql. It used to install postgresql 8.3 but we want to upgrade the DB to 9.4.4.

The old version on of the DB used to be an MSI installer but they have switched to a Windows EXE with the later version. The old version was able to install over a existing installation. The new version breaks and gives me a very general error that I can’t use to pinpoint the issue.

Error:

The installation directory must be an absolute path, containing only
letters, numbers and the characters ‘-’, ‘/’, ‘.’ and ‘_’, and must be
writable.

This is my install command:

postgresql-9.4.4-1-windows-x64.exe –prefix “C:postgres” –datadir
“C:postgresdata” –enable_acledit 1 –install_runtimes 0
–serverport 5432 –superpassword “XXXX” –servicepassword “XXXX” –unattendedmodeui minimal –mode unattended –debuglevel 0 –serviceaccount “postgres” –create_shortcuts 0

The Bitrock installer only provides the above error with no path detail or anything to act on.

Any clues or suggestions? (Thanks)

↧

Retrieve additional columns in recursive CTE

August 14, 2015, 1:50 pm

≫ Next: Safe to Restore Database from Untrusted Sources

≪ Previous: How to reinstall PostgreSQL over an existing installation

This works as far as getting the amount of children a “thread” has but now I can’t seem to get it to pull parent row columns. Parent rows have parent_id is null, trees can be any level deep.

I manage to do it with two separate queries but there has to be a way to just use one and get the count of the children:

with recursive all_comments as (
   select id, parent_id, id as root_id
   from comment
   where parent_id is null
   union all
   select c.id, c.parent_id, p.root_id
   from comment c
     join all_comments p on c.parent_id = p.id
)
select root_id, count(*) as comment_count
from all_comments
group by root_id;

How would I pull content column from the parent comment in this fiddle?

http://sqlfiddle.com/#!15/158ea/15

↧

Safe to Restore Database from Untrusted Sources

August 16, 2015, 11:00 am

≫ Next: How would I make this query run faster in postgres [closed]

≪ Previous: Retrieve additional columns in recursive CTE

I am asking someone to aggregate around 100GB of data for me. I would prefer this to be in a Postgres database instead of something like CSV. I was thinking pg_dump and pg_restore custom format. Is it safe to restore the database onto my machine? Can they inject something malicious in there?

↧

How would I make this query run faster in postgres [closed]

August 17, 2015, 7:00 am

≫ Next: Managing and speeding up queries on PostgreSQL table with over 3 trillion rows

≪ Previous: Safe to Restore Database from Untrusted Sources

Query:

with customer_total_return as
 (select wr_returning_customer_sk as ctr_customer_sk
        ,ca_state as ctr_state,
        sum(wr_return_amt) as ctr_total_return
 from web_returns
     ,date_dim
     ,customer_address
 where wr_returned_date_sk = d_date_sk
   and d_year =2002
   and wr_returning_addr_sk = ca_address_sk
 group by wr_returning_customer_sk
         ,ca_state)
  select  c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
       ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
       ,c_last_review_date,ctr_total_return
 from customer_total_return ctr1
     ,customer_address
     ,customer
 where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2
                          from customer_total_return ctr2
                          where ctr1.ctr_state = ctr2.ctr_state)
       and ca_address_sk = c_current_addr_sk
       and ca_state = 'IL'
       and ctr1.ctr_customer_sk = c_customer_sk
 order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag
                  ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address
                  ,c_last_review_date,ctr_total_return
limit 100;

I have indexes created on :

wr_returning_customer_sk
wr_returned_date_sk
d_date_sk
ca_address_sk
wr_returning_addr_sk
ca_address_sk
ca_state, ca_country
c_current_addr_sk
c_customer_sk

EXPLAIN RESULT with enable_nestloop=on;

http://explain.depesz.com/s/F0d

    QUERY PLAN                                                                                                                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7293.09..7293.10 rows=3 width=253) (actual time=16112.713..16112.723 rows=92 loops=1)
   CTE customer_total_return
     ->  HashAggregate  (cost=4563.72..4567.82 rows=328 width=13) (actual time=48.362..50.644 rows=13357 loops=1)
           Group Key: web_returns.wr_returning_customer_sk, customer_address_1.ca_state
           ->  Nested Loop  (cost=0.58..4561.26 rows=328 width=13) (actual time=4.310..41.416 rows=13517 loops=1)
                 ->  Nested Loop  (cost=0.29..4445.05 rows=343 width=14) (actual time=4.304..18.758 rows=13862 loops=1)
                       ->  Seq Scan on date_dim  (cost=0.00..2318.11 rows=365 width=4) (actual time=4.294..8.421 rows=365 loops=1)
                             Filter: (d_year = 2002)
                             Rows Removed by Filter: 72684
                       ->  Index Scan using idx_wr_returned_date_sk on web_returns  (cost=0.29..5.51 rows=32 width=18) (actual time=0.002..0.019 rows=38 loops=365)
                             Index Cond: (wr_returned_date_sk = date_dim.d_date_sk)
                 ->  Index Scan using customer_address_pkey on customer_address customer_address_1  (cost=0.29..0.33 rows=1 width=7) (actual time=0.001..0.001 rows=1 loops=13862)
                       Index Cond: (ca_address_sk = web_returns.wr_returning_addr_sk)
   ->  Sort  (cost=2725.28..2725.29 rows=3 width=253) (actual time=16112.712..16112.717 rows=92 loops=1)
         Sort Key: customer.c_customer_id, customer.c_salutation, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_day, customer.c_birth_month, customer.c_birth_year, customer.c_birth_country, customer.c_login, customer.c_email_address, customer.c_last_review_date, ctr1.ctr_total_return
         Sort Method: quicksort  Memory: 48kB
         ->  Nested Loop  (cost=0.58..2725.25 rows=3 width=253) (actual time=126.693..16112.312 rows=92 loops=1)
               ->  Nested Loop  (cost=0.29..2688.63 rows=109 width=257) (actual time=57.825..16100.184 rows=3245 loops=1)
                     ->  CTE Scan on customer_total_return ctr1  (cost=0.00..2434.58 rows=109 width=36) (actual time=57.816..16077.175 rows=3255 loops=1)
                           Filter: (ctr_total_return > (SubPlan 2))
                           Rows Removed by Filter: 10102
                           SubPlan 2
                             ->  Aggregate  (cost=7.39..7.40 rows=1 width=32) (actual time=1.199..1.199 rows=1 loops=13357)
                                   ->  CTE Scan on customer_total_return ctr2  (cost=0.00..7.38 rows=2 width=32) (actual time=0.031..1.144 rows=383 loops=13357)
                                         Filter: (ctr1.ctr_state = ctr_state)
                                         Rows Removed by Filter: 12974
                     ->  Index Scan using customer_pkey on customer  (cost=0.29..2.32 rows=1 width=229) (actual time=0.006..0.006 rows=1 loops=3255)
                           Index Cond: (c_customer_sk = ctr1.ctr_customer_sk)
               ->  Index Scan using customer_address_pkey on customer_address  (cost=0.29..0.33 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=3245)
                     Index Cond: (ca_address_sk = customer.c_current_addr_sk)
                     Filter: (ca_state = 'IL'::bpchar)
                     Rows Removed by Filter: 1
 Planning time: 3.093 ms
 Execution time: 16112.860 ms
(34 rows)

EXPLAIN RESULT with enable_nestloop=off;

http://explain.depesz.com/s/KlR

    QUERY PLAN                                                                                                                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=14098.69..14098.70 rows=3 width=253) (actual time=16505.885..16505.895 rows=92 loops=1)
   CTE customer_total_return
     ->  HashAggregate  (cost=6236.00..6240.10 rows=328 width=13) (actual time=276.027..278.330 rows=13357 loops=1)
           Group Key: web_returns.wr_returning_customer_sk, customer_address_1.ca_state
           ->  Hash Join  (cost=4455.76..6233.54 rows=328 width=13) (actual time=246.246..268.881 rows=13517 loops=1)
                 Hash Cond: (customer_address_1.ca_address_sk = web_returns.wr_returning_addr_sk)
                 ->  Seq Scan on customer_address customer_address_1  (cost=0.00..1587.00 rows=50000 width=7) (actual time=0.002..4.910 rows=50000 loops=1)
                 ->  Hash  (cost=4451.47..4451.47 rows=343 width=14) (actual time=246.228..246.228 rows=13517 loops=1)
                       Buckets: 1024  Batches: 1  Memory Usage: 632kB
                       ->  Merge Join  (cost=1426.00..4451.47 rows=343 width=14) (actual time=98.441..244.338 rows=13862 loops=1)
                             Merge Cond: (web_returns.wr_returned_date_sk = date_dim.d_date_sk)
                             ->  Index Scan using idx_wr_returned_date_sk on web_returns  (cost=0.29..2755.33 rows=71763 width=18) (actual time=0.004..92.517 rows=59301 loops=1)
                             ->  Index Scan using date_dim_pkey on date_dim  (cost=0.29..2905.95 rows=365 width=4) (actual time=7.100..142.563 rows=13837 loops=1)
                                   Filter: (d_year = 2002)
                                   Rows Removed by Filter: 958434
   ->  Sort  (cost=7858.59..7858.60 rows=3 width=253) (actual time=16505.883..16505.888 rows=92 loops=1)
         Sort Key: customer.c_customer_id, customer.c_salutation, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag, customer.c_birth_day, customer.c_birth_month, customer.c_birth_year, customer.c_birth_country, customer.c_login, customer.c_email_address, customer.c_last_review_date, ctr1.ctr_total_return
         Sort Method: quicksort  Memory: 48kB
         ->  Hash Join  (cost=3425.75..7858.57 rows=3 width=253) (actual time=16482.304..16505.567 rows=92 loops=1)
               Hash Cond: (customer.c_customer_sk = ctr1.ctr_customer_sk)
               ->  Hash Join  (cost=989.81..5370.73 rows=3192 width=225) (actual time=1.643..25.147 rows=3200 loops=1)
                     Hash Cond: (customer.c_current_addr_sk = customer_address.ca_address_sk)
                     ->  Seq Scan on customer  (cost=0.00..3849.00 rows=100000 width=229) (actual time=0.005..9.459 rows=100000 loops=1)
                     ->  Hash  (cost=969.86..969.86 rows=1596 width=4) (actual time=1.620..1.620 rows=1596 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 57kB
                           ->  Bitmap Heap Scan on customer_address  (cost=21.58..969.86 rows=1596 width=4) (actual time=0.280..1.424 rows=1596 loops=1)
                                 Recheck Cond: (ca_state = 'IL'::bpchar)
                                 Heap Blocks: exact=831
                                 ->  Bitmap Index Scan on idx_customer_address_1  (cost=0.00..21.18 rows=1596 width=0) (actual time=0.191..0.191 rows=1596 loops=1)
                                       Index Cond: (ca_state = 'IL'::bpchar)
               ->  Hash  (cost=2434.58..2434.58 rows=109 width=36) (actual time=16479.695..16479.695 rows=3245 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 140kB
                     ->  CTE Scan on customer_total_return ctr1  (cost=0.00..2434.58 rows=109 width=36) (actual time=285.509..16478.614 rows=3255 loops=1)
                           Filter: (ctr_total_return > (SubPlan 2))
                           Rows Removed by Filter: 10102
                           SubPlan 2
                             ->  Aggregate  (cost=7.39..7.40 rows=1 width=32) (actual time=1.212..1.212 rows=1 loops=13357)
                                   ->  CTE Scan on customer_total_return ctr2  (cost=0.00..7.38 rows=2 width=32) (actual time=0.031..1.157 rows=383 loops=13357)
                                         Filter: (ctr1.ctr_state = ctr_state)
                                         Rows Removed by Filter: 12974
 Planning time: 8.813 ms
 Execution time: 16506.079 ms
(42 rows)

I created indexes on all necessary fields but nothing seems to work. With just 1GB of data, I am expecting this to take less than 5 sec. For 100GB, it is currently taking 2 days!

I have set work_mem=1000MB. I am pretty much new to interpreting query plans and would like to know how to improve the execution time. Thanks in advance!

↧

Managing and speeding up queries on PostgreSQL table with over 3 trillion rows

August 17, 2015, 10:50 am

≫ Next: Performance issues with inherited tables and indices

≪ Previous: How would I make this query run faster in postgres [closed]

I have time series data which spans over 10 years and has over 3 trillion rows and 10 columns.

At the moment I use a PCIe SSD with 128GB of RAM and I am finding that querying takes a significant amount of time.
For example running the below command takes well over 15 mins:

SELECT * FROM table WHERE column_a = 'value1' AND column_b = 'value2';

The table is mostly used for reads. The only time the table is written to is during weekly updates which insert about 15 million rows.

What are the best way to manage tables so large? Would you recommend splitting it by year?

The table size is 542 GB and the external size is 109 GB.

EXPLAIN (BUFFERS, ANALYZE) output:

"Seq Scan on table  (cost=0.00..116820941.44 rows=758 width=92) (actual time=0.011..1100643.844 rows=667 loops=1)"
"  Filter: (("COLUMN_A" = 'Value1'::text) AND ("COLUMN_B" = 'Value2'::text))"
"  Rows Removed by Filter: 4121893840"
"  Buffers: shared hit=2 read=56640470 dirtied=476248 written=476216"
"Total runtime: 1100643.967 ms"

The table was created using the following code:

    CREATE TABLE table_name
(
  "DATE" timestamp with time zone,
  "COLUMN_A" text,
  "COLUMN_B" text,
  "VALUE_1" double precision,
  "VALUE_2" double precision,
  "VALUE_3" double precision,
  "VALUE_4" double precision,
  "VALUE_5" double precision,
  "VALUE_6" double precision,
  "VALUE_7" double precision,
)
WITH (
  OIDS=FALSE
);
ALTER TABLE table_name
  OWNER TO user_1;

CREATE INDEX "ix_table_name_DATE"
  ON table_name
  USING btree
  ("DATE");

↧

Performance issues with inherited tables and indices

August 17, 2015, 10:50 am

≫ Next: Where to store lat/lon data with additional informations?

≪ Previous: Managing and speeding up queries on PostgreSQL table with over 3 trillion rows

I have a PostgreSQL database with a master table and 2 child tables.
My master table:

CREATE TABLE test (
    id serial PRIMARY KEY, 
    date timestamp without time zone
);
CREATE INDEX ON test(date);

My child tables:

CREATE TABLE test_20150812 (
    CHECK ( date >= DATE '2015-08-12' AND date < DATE '2015-08-13' )
) INHERITS (test);

CREATE TABLE test_20150811 (
    CHECK ( date >= DATE '2015-08-11' AND date < DATE '2015-08-12' )
) INHERITS (test);

CREATE INDEX ON test_20150812(date);
CREATE INDEX ON test_20150811(date);

When I execute query like:

select * from test_20150812 where date > '2015-08-12' order by date desc;

It returns very quickly (20-30 miliseconds). EXPLAIN output:

 Limit  (cost=0.00..2.69 rows=50 width=212)
   ->  Index Scan Backward using test_20150812_date_idx on test_20150812  (cost=0.00..149538.92 rows=2782286 width=212)
         Index Cond: (date > '2015-08-12 00:00:00'::timestamp without time zone)

However if I execute query like:

select * from test where date > '2015-08-12' order by date desc;

It takes a long time (10-15 seconds). EXPLAIN output:

 Limit  (cost=196687.06..196687.19 rows=50 width=212)
   ->  Sort  (cost=196687.06..203617.51 rows=2772180 width=212)
         Sort Key: public.test.date
         ->  Result  (cost=0.00..104597.24 rows=2772180 width=212)
               ->  Append  (cost=0.00..104597.24 rows=2772180 width=212)
                     ->  Seq Scan on test  (cost=0.00..0.00 rows=1 width=1857)
                           Filter: (date > '2015-08-12 00:00:00'::timestamp without time zone)
                     ->  Seq Scan on test_20150812 test  (cost=0.00..104597.24 rows=2772179 width=212)
                           Filter: (date > '2015-08-12 00:00:00'::timestamp without time zone)

constraint_exclusion is set to ON in my postgresql.conf. Therefore it should only be executed on test_20150812.

I see that, if a query is executed on master table, indices are never used. How can I improve it? I want to make all my queries on my master table. When querying for a specific date I expect no performance difference between querying on the master table or the child table.

↧

Where to store lat/lon data with additional informations?

August 17, 2015, 2:00 pm

≫ Next: Postgres data integration

≪ Previous: Performance issues with inherited tables and indices

I have huge database of lat/lon GPS coordinate points. For every point, I have several other variables (height, surface type, temperature, pressure…).

Also, data will be changed every day and old ones drop, no history.

I want to be able for given point find N-closest points in my database. Should I use PostGIS for this, or classic PostreSQL or MySQL is enough? I dont need any other data transformations, only finding closest (and even not by real distance in km). This is supported also by classic database systems and I am not sure if PostGIS is not like going with cannon to pigeon.

↧

Postgres data integration

August 18, 2015, 11:00 am

≫ Next: How to create a trigger for multiple schemas?

≪ Previous: Where to store lat/lon data with additional informations?

Consider a scenario where I have 3 standalone running Postgres engines
that are installed on three separate machines.

The data on this machine is inserted, updated or removed each month by employees. Changes made to the database must be sent and integrated into a master engine by the means of a usb drive (all the work is done offline).

One way is to always copy all the data from the 3 machines each month and then copy them to the master machine. Is there a more practical solution or technology to that on Postgres?

↧

How to create a trigger for multiple schemas?

August 18, 2015, 4:50 pm

≫ Next: Pagination – Text comparison with greater than and less than with DESC

≪ Previous: Postgres data integration

One of my PostgreSQL databases contains different schemas which share the same structure.

-- schema region_a
CREATE TABLE region_a.point (
gid serial NOT NULL,
geom geometry(point, SRID),
attribute_sample varchar(255),
CONSTRAINT point_pkey PRIMARY KEY (gid)
);

CREATE TABLE region_a.polygon (
gid serial NOT NULL,
geom geometry(polygon, SRID),
attribute varchar(255),
CONSTRAINT polygon_pkey PRIMARY KEY (gid)
);

-- schema region_b
CREATE TABLE region_b.point (
gid serial NOT NULL,
geom geometry(point, SRID),
attribute_sample varchar(255),
CONSTRAINT point_pkey PRIMARY KEY (gid)
);

CREATE TABLE region_b.polygon (
gid serial NOT NULL,
geom geometry(polygon, SRID),
attribute varchar(255),
CONSTRAINT polygon_pkey PRIMARY KEY (gid)
);

-- schema region_c
-- ...

Now I wonder how to create a trigger to add sample points on the polygon features within a schema.

CREATE OR REPLACE FUNCTION sample_attribute_from_polygon()
RETURNS trigger AS $body$
    BEGIN
        NEW.attribute_sample = (SELECT attribute FROM polygon
        WHERE ST_Within(NEW.geom, polygon.geom));
        RETURN NEW;
    END;
$body$ LANGUAGE plpgsql; 

CREATE TRIGGER sample_attribute_from_polygon_tg BEFORE INSERT OR UPDATE
ON point FOR EACH ROW
EXECUTE PROCEDURE sample_attribute_from_polygon();

Is there any way to use the same trigger for all schemas? I’m looking for a solution that’s also working when renaming a schema.

↧

Pagination – Text comparison with greater than and less than with DESC

August 19, 2015, 2:00 am

≫ Next: To minimize Cache misses in PostgreSQL?

≪ Previous: How to create a trigger for multiple schemas?

I am implementing a seek method for pagination and am wondering about how to best query on a text column with DESC. The queries for this seek approach use a less than or greater than depending on if you are sorting ASC or DESC. This works great for integers and dates but I am wondering how best to do it with text columns, specifically for the first page.

For example, for the first page when sorting by name it would be

SELECT *
FROM users
WHERE first_name > ''
ORDER BY first_name ASC
LIMIT 5;

Then the next page would be

SELECT *
FROM users
WHERE first_name > 'Caal'
ORDER BY first_name ASC
LIMIT 5;

This works great. I am unsure about DESC order though. This seems to work but I am unsure if it is ‘correct’.

 SELECT     *
 FROM   users
 WHERE  last_name < 'ZZZ'
 ORDER BY last_name DESC
 LIMIT 5;

Second page

SELECT  *
FROM    users
WHERE   last_name < 'Smith'
ORDER BY last_name DESC
LIMIT 5;

P.S. I am using the jooq support for the seek method and prefer to not have to hack around the native support, so ideally there is a proper parameter to put in the ‘ZZZ’ place above. i.e. there WHERE part of the clause is mandatory.

↧

To minimize Cache misses in PostgreSQL?

August 19, 2015, 2:00 am

≫ Next: Combine two event tables into a single timeline

≪ Previous: Pagination – Text comparison with greater than and less than with DESC

You can calculate cache misses as described here.
However, I am interested in how to minimize the phenomenon.
I have some algorithms that are based on hash tables, which cause much cache misses by having random accesses.
I am interested in how you can minimize cache misses in PostgreSQL.

How can you minimize Cache misses in PostgreSQL by design?

↧

Combine two event tables into a single timeline

August 20, 2015, 1:00 am

≫ Next: To find data in Multiple tables and delete it?

≪ Previous: To minimize Cache misses in PostgreSQL?

Given two tables:

CREATE TABLE foo (ts timestamp, foo text);
CREATE TABLE bar (ts timestamp, bar text);

I wish to write a query that returns values for ts, foo, and bar that represents a unified view of the most recent values. In other words, if foo contained:

ts | foo
--------
1  | A
7  | B

and bar contained:

ts | bar
--------
3  | C
5  | D
9  | E

I want a query that returns:

ts | foo | bar
--------------
1  | A   | null
3  | A   | C
5  | A   | D
7  | B   | D
9  | B   | E

If both tables have an event at the same time, the order does not matter.

I have been able to create the structure needed using union all and dummy values:

SELECT ts, foo, null as bar FROM foo
UNION ALL SELECT ts, null as foo, bar FROM bar

which will give me a linear timeline of new values, but I’m not quite able to work out how to populate the null values based on the previous rows. I’ve tried the lag window function, but AFAICT it will only look at the previous row, not recursively backward. I’ve looked at recursive CTEs, but I’m not quite sure how to set up the start and termination conditions.

↧

To find data in Multiple tables and delete it?

August 22, 2015, 1:00 am

≫ Next: postgresql – how and why indexes are bigger than their tables

≪ Previous: Combine two event tables into a single timeline

I am using a PostgreSQL server. Suppose I have 200 values in a master table that need to be deleted. There are 9 other tables referencing this master table. The other 9 tables not only reference the master table, but reference other tables also. The 200 values in the master table are spread out in other 9 tables.

My question is I want to find out if any of the 200 values are present in all the 10 tables and delete them.

Suppose the master table has id =1 and this record needs to be deleted. Apart from deleting the value in this table, I need to check the remaining 9 tables whether this value is present or not and delete those records also.

How can I do this?

↧

postgresql – how and why indexes are bigger than their tables

August 22, 2015, 1:01 am

≫ Next: Comparing the data in two Postgres databases

≪ Previous: To find data in Multiple tables and delete it?

I’m using postgresql 9.3 and trying to understand how and why indexes are bigger than their tables.

Sample output:

 database_name | database_size |                          table_name                          | table_size | indexes_size | total_size
---------------+---------------+--------------------------------------------------------------+------------+--------------+------------
 foo_12345 | 412 MB        | "foobar_dev_12345"."fact_mobile_sends"                       | 57 MB      | 131 MB       | 189 MB
 foo_12345 | 412 MB        | "foobar_dev_12345"."fact_mobile_started"                      | 17 MB      | 39 MB        | 56 MB
 foo_12345 | 412 MB        | "foobar_dev_12345"."fact_mobile_stopped"                      | 16 MB      | 35 MB        | 51 MB

I’m running the following query to get the table and index sizes.

SELECT
    table_catalog AS database_name,
    pg_size_pretty(pg_database_size(current_database())) As database_size,
    table_name,
    pg_size_pretty(table_size) AS table_size,
    pg_size_pretty(indexes_size) AS indexes_size,
    pg_size_pretty(total_size) AS total_size
FROM (
    SELECT
        table_catalog,
        pg_database_size(current_database()) AS database_size,
        table_name,
        pg_table_size(table_name) AS table_size,
        pg_indexes_size(table_name) AS indexes_size,
        pg_total_relation_size(table_name) AS total_size
    FROM (
        SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name, table_catalog
        FROM information_schema.tables
    ) AS all_tables
    ORDER BY total_size DESC
) AS pretty_sizes;

Is my query correct? What would cause indexes to be bigger?

↧

Comparing the data in two Postgres databases

August 22, 2015, 6:00 am

≫ Next: Use pg_dump with postgis extensions?

≪ Previous: postgresql – how and why indexes are bigger than their tables

I am looking for a way to compare the data in two Postgres databases.

Basically a ‘Before’ and ‘After’ snapshot of a transaction being posted. What I am looking for is all the tables/records that have been altered by a particular type of transaction.

The solution could use ODBC, or PostgreSQL directly. I have connections to the databases both ways.

↧

Use pg_dump with postgis extensions?

August 22, 2015, 6:00 am

≫ Next: How to create MBtiles from geoserver

≪ Previous: Comparing the data in two Postgres databases

I’m using Postgres 9.4. I’m trying to dump a database (both the schema and the actual data) with the PostGIS extensions (set up using CREATE EXTENSION):

$ pg_dump prescribing -U prescribing -h localhost -Fc > prescribing.dump
Password:

But when I type in the db password, I see this:

pg_dump: [archiver (db)] query failed: ERROR:  permission denied for schema topology
pg_dump: [archiver (db)] query was: LOCK TABLE topology.topology IN ACCESS SHARE MODE

How can I dump this database?

UPDATE: it says these tables are owned by postgres:

 public | frontend_sha                               | table             | prescribing
 public | geography_columns                          | view              | postgres
 public | geometry_columns                           | view              | postgres
 public | pg_stat_statements                         | view              | postgres
 public | raster_columns                             | view              | postgres
 public | raster_overviews                           | view              | postgres
 public | spatial_ref_sys                            | table             | postgres
 public | vw_chemical_summary_by_ccg                 | materialized view | prescribing

Maybe I could just dump the other tables, and import them into another database set up in the same way?

↧

How to create MBtiles from geoserver

August 23, 2015, 9:50 am

≫ Next: Postgres INNER JOIN same table failing on CartoDB

≪ Previous: Use pg_dump with postgis extensions?

How to generate mbtiles in geoserver? using openlayers for displaying geoserver layer

for example calling wms layer like this

       new OpenLayers.Layer.WMS("Kanpur", "http://localhost:8080/geoserver/wms",
        {
                LAYERS: 'sample_data_old:sample',
                STYLES: '',
                format: 'image/jpeg',
                tiled: false,
                transparent: true,
                tilesOrigin : map.maxExtent.left + ',' + map.maxExtent.bottom,
                visibility:false
                },
        { 
                isBaseLayer: true 
        }  
);

how can i call wps layer for creating mbtiles?

please refer this link for my question

http://docs.geoserver.org/stable/en/user/community/mbtiles/output.html

↧