Understanding row estimation for timestamp in postgresql

January 2, 2016, 12:00 am

≫ Next: Restore PostgreSQL database (or db names) to new version, from files?

≪ Previous: pg_stat_statements not found even with “shared_preload_libraries = 'pg_stat_statements'

PostgreSQL 9.4

I have a table (call it tbl) that’s got the column registration_date timestamp. Now, for simplicity I set the following statistic parameter:

ALTER TABLE tbl ALTER COLUMN registration_date SET STATISTICS 2;
ANALYZE tbl;

and than I’ve got the following statistic for the table (from pg_stats)

avg_width n_distinct  most_common_vals most_common_freqs    
    8         -1             ""               ""                   
                         histogram_bounds
 2012-03-26 10:32:15.379,2013-11-05 19:33:09.828,2015-11-19 14:39:04.676

And the query

SELECT reltuples,relpages FROM pg_class WHERE relname = 'tbl'

returns this:

reltuples relpages
 240656     2476

Now, I execute the query:

EXPLAIN ANALYZE SELECT *
FROM tbl
WHERE tbl.registration_date > '2014-11-11' -- 2014-11-11 is not in histgrom_bounds

And got the following ananlyzed plan:

Seq Scan on tbl  (cost=0.00..5484.20 rows=60441 width=47) (actual time=3.872..26.406 rows=604 loops=1)
  Filter: (registration_date > '2014-11-11 00:00:00'::timestamp without time zone)
  Rows Removed by Filter: 240052
Planning time: 0.119 ms
Execution time: 26.439 ms

QUESTION: How is the estimated number of rows computed in my case? I don’t understand the 60441 rows count in the plan. Note that

SELECT reltuples/4 FROM pg_class WHERE relname = 'tbl'

returns 60164 which is less than 60441.

↧

Restore PostgreSQL database (or db names) to new version, from files?

January 3, 2016, 12:00 am

≫ Next: psql “invalid client_encoding” error on OS X, PostgreSQL 9.4.5

≪ Previous: Understanding row estimation for timestamp in postgresql

Scenario in short:

a development laptop broke
old HDD is still readable
new laptop has PostgreSQL 9.4 instead of 9.1
both laptops use a flavor of Ubuntu Linux

Question 1: Is my understanding correct that in order to restore the old data, I would need a server with the same major+minor version as the one that wrote the old data directory? Since no 9.1 packages are available for my distribution, I would have to compile a v9.1.x server from source, copy over the old data, start the server and perform a normal pg_dump, which could then be restored to the new cluster?

Question 2: There were around 10-15 databases on the old laptop, but since it was a development machine, in theory all of the data should be replacable, apart from some local experiments. I’m thinking of just scrapping the old data, but I can’t remember with 100% certainty what those databases were. Is there a way to extract some basic information (such as database names, maybe even sizes or timestamps) from the old data directory without running a 9.1 server?

↧

psql “invalid client_encoding” error on OS X, PostgreSQL 9.4.5

January 3, 2016, 9:52 am

≫ Next: postgresql out of memory [closed]

≪ Previous: Restore PostgreSQL database (or db names) to new version, from files?

I’m currently running a OS X Lion Server system which ships with a built-in and not-upgradable PostgreSQL version.
After years of usage I’ve finnaly decided to leave the built-in version and install an indipendent version. I disabled the built-in installation and downloaded the installer from EDB and followed the wizard. After many issues reguarding encoding and locales, I’ve finally managed how to setup a DB with no locale and UTF8 encoding. I issued the following command:

initdb -D /path/to/data --no-locale --encoding=UTF8

If I connect using pgAdminIII I get no problems. The command show client_encoding; displays UNICODE as the encoding used by pgAdminIII (the default installation gave me a SQL_ASCII encoding and that’s why I run the initdb command).

The problem is that I’m not able to connect to PostgreSQL using psql. Whatever I pass to it, I get the following error:

psql: invalid connection option "client_encoding"

I’ve searched through the Internet but found nothing that solves my problem (for example issuing env PGCLIENTENCODING=UTF8 and adding client_encoding=UTF8 to postgresql.conf).

otool -L /Library/PostgreSQL/9.4/bin/psql returns:

/Library/PostgreSQL/9.4/bin/psql:
    @loader_path/../lib/libpq.5.dylib (compatibility version 5.0.0, current version 5.7.0)
    @loader_path/../lib/libssl.1.0.0.dylib (compatibility version 1.0.0, current version 1.0.0)
    @loader_path/../lib/libedit.0.dylib (compatibility version 1.0.0, current version 1.48.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

Anyone can help me to figure it out?

Many thanks to all
Pietro

UPDATE

I forgot to logout and then login after editing the bash profile. The suggestion made by Daniel Vérité was right. I just edited the DYLD_LIBRARY_PATH env variable in /etc/profile in order to make it visible at a global level and not only from the interactive shell.

I added the following line to /etc/profile:

export DYLD_LIBRARY_PATH='/Library/PostgreSQL/9.4/lib'

I hope this helps others.

Daniel, Thank you a lot

↧

postgresql out of memory [closed]

January 3, 2016, 3:00 pm

≫ Next: How to drop all of my functions in postgres in single step

≪ Previous: psql “invalid client_encoding” error on OS X, PostgreSQL 9.4.5

I clip 70 raster with polygon in postgresql, the result are: out of memory, Fatal on request size of 10872

I read some solution about changing the postgresql.conf parameter but nothing work for me

Mine is ubuntuOS, 8GB RAM, the config is

#shared_buffers = 1024MB                                                     
#huge_pages = try                                                          
#temp_buffers = 512MB                 
#max_prepared_transactions = 0         
#work_mem = 64MB                       
#maintenance_work_mem = 512MB          
#autovacuum_work_mem = -1

Any soluton for this problem?

↧

How to drop all of my functions in postgres in single step

January 3, 2016, 4:00 pm

≫ Next: pgpool load balancing is sending all queries only to master

≪ Previous: postgresql out of memory [closed]

Right now I have to use query to get the command in text file.
Then remove double quote from them.
And finally, run that file in psql shell.
Here is my steps.
1. Use these
http://stackoverflow.com/questions/10591113/drop-all-functions-from-postgres-database

SELECT 'DROP FUNCTION ' || ns.nspname || '.' || proname 
       || '(' || oidvectortypes(proargtypes) || ');'
FROM pg_proc INNER JOIN pg_namespace ns ON (pg_proc.pronamespace = ns.oid)
WHERE ns.nspname = 'dugong'  order by proname;

2. Remove double quote by sed

`sed 's/"//g' file.txt`

3. Finally copy and paste to psql shell.

My question :
How to drop all of my functions in postgres in single step

↧

pgpool load balancing is sending all queries only to master

January 3, 2016, 9:00 pm

≫ Next: Using SELECT to call a function in a jdbc migration

≪ Previous: How to drop all of my functions in postgres in single step

My two postgresql servers are configured for streaming replication, which is working fine.

Pgpool is configured for Master Slave mode / Load Balance Mode.

pgpool.conf:

listen_addresses = '*'
port = 9999
backend_hostname0 = 'master-postgres-ip'
backend_port0 = port-no
backend_weight0 = 1
backend_data_directory0 = 'data-dir'
backend_hostname1 = 'slave-postgres-ip'
backend_port1 = port-no
backend_weight1 = 1
backend_data_directory1 = 'data-dir'
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode='stream'

I expected all write queries will go to primary and read queries will be distributed between two. But, all the queries are going only to master. However, if I stop master, queries are going to slave.

Can somebody tell me what might be going wrong?

pgpool gives below log on startup:

2015-11-03 17:25:56: pid 21284: LOG:  find_primary_node: checking backend no 0
2015-11-03 17:25:56: pid 21284: LOG:  find_primary_node: checking backend no 1
2015-11-03 17:25:56: pid 21284: DEBUG:  SSL is requested but SSL support is not available
2015-11-03 17:25:56: pid 21284: DEBUG:  authenticate kind = 3
2015-11-03 17:25:56: pid 21284: ERROR:  failed to authenticate
2015-11-03 17:25:56: pid 21284: DETAIL:  invalid authentication message response type, Expecting 'R' and received 'E'
2015-11-03 17:25:56: pid 21284: DEBUG:  find_primary_node: no primary node found

↧

Using SELECT to call a function in a jdbc migration

January 4, 2016, 10:00 am

≫ Next: Generate incrementing IDs without sequences

≪ Previous: pgpool load balancing is sending all queries only to master

I’m looking to set up some Postgres/PostGIS migrations with clojure/jdbc.

running side-effect functions with SELECT is proving to be an issue, with most migration libs eventually throwing the A result was returned when none was expected error, because at some point they use clojure.java.jdbc/execute! or clojure.java.jdbc/db-do-commands, which seems understandable, but frustrating when you need to call a function that’s critical to the migration.

The PostGIS’s docs encourage using SELECT statement to create a spatial column:

SELECT AddGeometryColumn('parks', 'park_geom', 128, 'MULTIPOLYGON', 2 );

Has anyone run into this or found an appropriate workaround for using functions in a clojure/jdbc and Postgres migration?

related tidbits:

this description for manually registering a spatial column looks promising but seems remarkably heavy-handed for something that already has a supporting function
there’s also PL/PgSQL‘s PERFORM statement that I stumbled across but it seems like I’m grasping at straws at that point, despite it looking promising
clojure.java.jdbc/execute! docs give a specific heads-up about only using “general (non-select) SQL operation[s]”

↧

Generate incrementing IDs without sequences

January 4, 2016, 10:00 am

≫ Next: CREATE TEMPORARY TABLE, temp_buffers and Postgres performance

≪ Previous: Using SELECT to call a function in a jdbc migration

Suppose there’s a multi-tenant application where users can create some kind of documents with basic structure like

CREATE TABLE users (id SERIAL PRIMARY KEY, email TEXT);
CREATE TABLE documents (
  id SERIAL PRIMARY KEY
, document_id INT NOT NULL
, user_id INT NOT NULL
, text TEXT);

For each user document_id starts with 1 and increases preferably with gaps being a rare occurrence. The obvious solution is to create a sequence for each user get the document_id from there. But according to this databases don’t behave well when there are lots of relations there. Another solution is to store next_document_id in users table and update it as necessary, but that means the lock on this row will be highly contested slowing simultaneous transactions from the same user. Any other ideas?

↧

CREATE TEMPORARY TABLE, temp_buffers and Postgres performance

January 5, 2016, 9:50 am

≫ Next: Select query not giving result for inner query values

≪ Previous: Generate incrementing IDs without sequences

I have a bunch of queries that use temporary tables, and most of them are working really good, but from time to time they take an usual amount of time: 3-5 seconds.

My temp_buffers is set to the default 8Mb and I’m thinking that maybe the problem is caused by the buffer being swapped to the disk when it overflows or something like that.

Is there a way of checking how much of the buffer is taken? Is my way of thinking at all reasonable? Maybe the allocation of new space is really fast and I should look at other places?

↧

Select query not giving result for inner query values

January 5, 2016, 9:50 am

≫ Next: Find unmatched rows between two tables dynamically

≪ Previous: CREATE TEMPORARY TABLE, temp_buffers and Postgres performance

Here is my schema and data

create table mytable (id numeric, val text);
create table mytable1 (id1 numeric, id text);

insert into mytable values (123, 'aaa');
insert into mytable values (124, 'bbb');
insert into mytable values (125, 'ccc');
insert into mytable1 values (1001, '[123]');
insert into mytable1 values (1002, '[123,124]');
insert into mytable1 values (1003, '[123,124,125]');

When I am running the below query, I am getting expected result.

select string_to_array(trim(mt1.id, '[]'), ',')::numeric[] from mytable1 mt1 where mt1.id1 = 1003

Result:

123,124,125

But, when I am passing the above query as inner query for a select query, I am not getting the result

select mt.val from mytable mt where mt.id = any (select string_to_array(trim(mt1.id, '[]'), ',')::numeric[] from mytable1 mt1 where mt1.id1 = 1003)

Expected result:

val
---
aaa
bbb
ccc

Anything wrong in the query?

(using Postgresql-9.1)

↧

Find unmatched rows between two tables dynamically

January 5, 2016, 12:00 pm

≫ Next: Update column with value of another column or another column

≪ Previous: Select query not giving result for inner query values

I have a function here that is supposed to take two tables as arguments and check if they are the same.

create or replace function testing.equal_tables(
    varchar,
    varchar)
    returns void as
$$
begin

    execute 'select *
     from
     (select * from ' || $1 ||'
     except
     select * from ' || $2 || ') a
     union
     (select * from ' || $2 || '
     except
     select * from ' || $1 || ');'
    ;

end;
$$ language plpgsql;

When I call it with these two tables, one with 20 rows and one with 10 rows, I get only the empty set, which is not the correct result:

select testing.equal_tables('ee1', 'ee2');

When I modified the function to return a string of the statement, it came back correctly, but that still doesn’t help because I’d like to execute the return string using a function, prepared statement, or something. Is there any way to make this function work?

↧

Update column with value of another column or another column

January 5, 2016, 12:00 pm

≫ Next: What role for plpythonu function's in the file system?

≪ Previous: Find unmatched rows between two tables dynamically

With PostgreSQL (I’m using the version 9.1) is it possible to do mass update with a single query a column value with the value of another column, but if the other column value is null, to use the value of a third column, and if the third one is absent to use the current datetime (all the column have type timestamp)

I need to change

columnA columnB columnC
null    foo     bar
null    null    baz
null    null    null

columnA columnB columnC
foo     foo     bar
baz     null    baz
quz     null    null

where quz is the current datetime.

↧

What role for plpythonu function's in the file system?

January 5, 2016, 11:00 pm

≫ Next: Efficient processing of ST_ValueCount Results in Postgresql

≪ Previous: Update column with value of another column or another column

As a preliminary test for a further work I’m trying to use a simple plpythonu function in Postgresql 9.2 to create a folder in my filesystem. So I have this code :

CREATE OR REPLACE FUNCTION "mkdir_test"() 
RETURNS void AS $BODY$ 

import os

dir = os.path.dirname('/tmp/areas/testdir/')
if not os.path.exists(dir):
    os.makedirs(dir)


$BODY$

LANGUAGE plpythonu
COST 100
CALLED ON NULL INPUT
SECURITY INVOKER
VOLATILE;
ALTER FUNCTION "mkdir_test"() OWNER TO "chewbacca";

It works, but then the directory created ‘testdir’ belongs to _postgres and has privileges ’700′, meaning that it is forbidden to anyone but postgres. How can I change this so that the user triggering this function is the owner of the file/folder created ? (Currently I’m doing this on Mac os x 10.11 but the objective is to have this working on any OS.)

↧

Efficient processing of ST_ValueCount Results in Postgresql

January 6, 2016, 12:50 pm

≫ Next: PGBouncer pausing hanging from ltm monitor

≪ Previous: What role for plpythonu function's in the file system?

General Problem: I have a table of rasters. The rasters are classified MSI images, so each pixel is an integer indicating the class of the pixel. For a number of regions, I am querying pixel counts in those regions, i.e., making a table like:

filename | total pixel count | pixels in class 0 | pixels in class 1 | ...

Specific Problem: My problem is that the script takes a long time, so I want to get the runtime down.

What I’ve tried: Full disclosure, I’m not well versed in postgresql. Also, note that in the two code snippets below, the most inner sub-selects are identical. Here’s my first attempt:

EXPLAIN ANALYZE SELECT 
    filename AS filename, 
    ST_Count(rast,1) AS totalpixels,
    (ST_ValueCount(rast,1,false,ARRAY[0.0])).count AS nodata,
    (ST_ValueCount(rast,1,false,ARRAY[1.0])).count AS lowveg,
    (ST_ValueCount(rast,1,false,ARRAY[2.0])).count AS highveg,
    (ST_ValueCount(rast,1,false,ARRAY[12.0])).count AS clouds,
    (ST_ValueCount(rast,1,false,ARRAY[13.0])).count AS shadow
FROM 
( 
    SELECT 
        filename, 
        ST_Clip(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) AS rast 
    FROM 
        rasters 
    WHERE 
        ST_Intersects(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) 
) AS source_rasters;

This runs in 185 ms. I thought surely it’s sub-optimal to ST_ValueCount so many times. So here’s my improved attempt — running it once and converting the SETOF result to an array so that I can index the values:

EXPLAIN ANALYZE SELECT 
    filename AS filename,
    totalpixels AS totalpixels,
    pxcnt[1] AS nodata,
    pxcnt[2] AS lowveg,
    pxcnt[3] AS highveg,
    pxcnt[4] AS clouds,
    pxcnt[5] AS shadow
FROM 
(
    SELECT 
        filename AS filename,
        ST_Count(rast,1,false) AS totalpixels,
        ARRAY( SELECT count FROM ST_ValueCount(rast,1,false,ARRAY[0.0,1.0,2.0,12.0,13.0]) ) AS pxcnt
    FROM 
    ( 
        SELECT 
            filename, 
            ST_Clip(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) AS rast 
        FROM 
            rasters 
        WHERE 
            ST_Intersects(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) 
    ) AS source_rasters
) AS f;

But it only reduced the runtime to 155 ms.

But then I thought maybe the ST_ValueCount is only a small portion of the total work cost, so this is all the improvement I could expect. However, if I reference just one result:

EXPLAIN ANALYZE SELECT 
    filename AS filename,
    totalpixels AS totalpixels,
    pxcnt[1] AS nodata
FROM 
...

It cuts the runtime down to 55 ms, which I don’t understand because it seems to have done all the work of intersecting, clipping, counting, etc.

Question: So is there a faster way to unpack the results of ST_ValueCount, or an obvious way to speed this up in general?

Just for what it’s worth, I’ve made other incremental improvements since I began, e.g., tiling the rasters was a big improvement. At this point, this seems like the most likely opportunity for a significant improvement, please let me know if I might be wrong about that.

↧

PGBouncer pausing hanging from ltm monitor

January 7, 2016, 12:50 am

≫ Next: Do data checksums apply to large objects?

≪ Previous: Efficient processing of ST_ValueCount Results in Postgresql

Currently I have the following setup:

F5LTM –> PGBx2 –> DBx3

The PGBouncers are set up with priority groups on the F5 so in the event one fails or hangs traffic will still be sent to the database with an equal amount of connections allowed instead of splitting up the pools and having half the connections in the event one were to go down. The problem with this setup is that we’re unable to pause the bouncer due to the F5 monitors making it appear that a connection is still open.

Has anyone done a similar setup with some load balancing device or software handling connection pooler redundancy? Any thoughts or suggestions would be greatly appreciated.

↧

Do data checksums apply to large objects?

January 8, 2016, 12:50 pm

≫ Next: How to reduce size of PostgreSQL table after dropping columns?

≪ Previous: PGBouncer pausing hanging from ltm monitor

PostgreSQL 9.3 introduced the data checksums feature (initdb -k). Do these checksums apply to large objects?

↧

How to reduce size of PostgreSQL table after dropping columns?

January 8, 2016, 12:50 pm

≫ Next: How to get CPU and memory usage of PostgreSQL server and queries from database system views?

≪ Previous: Do data checksums apply to large objects?

I have a PostgreSQL database. One table is very large. I want to extract a TEXT column into a separate table and see how much I can reduce the size. The problem is that the size appears to stay the same no matter what I do.

I’m obtaining the size by issuing dt+ in psql.

I’ve tried VACUUM FULL and dp_dumpall followed by deleting the database and restoring it.

The size of the table did not change.

I added a second TEXT column, watched the size increase by a few hundred MB, deleted the column, and I cannot get the size to go down again.

How can I get the size of the table to go down after deleting these columns?

↧

How to get CPU and memory usage of PostgreSQL server and queries from database system views?

January 11, 2016, 1:50 am

≫ Next: Postgresql Disk Allocation in Begin End Transaction

≪ Previous: How to reduce size of PostgreSQL table after dropping columns?

We are looking for the way to get CPU and memory usage data of PostgreSQL 9.x for the monitoring purposes. The requirement is to provide the data on server, database, session and query level. We are required to use python and sql to get the data, we cannot install anything on the monitored databases, not even extensions. OS can be Windows or Linux.
There is a lot of info on i/o in statistic collector tables but we didn’t find anything related to CPU or memory.

↧

Postgresql Disk Allocation in Begin End Transaction

January 11, 2016, 3:50 am

≫ Next: Preserve order of array elements after join

≪ Previous: How to get CPU and memory usage of PostgreSQL server and queries from database system views?

I have a loop in transaction block that creates a temporary table,updates,exports to txt file and drops table at the end of the block.
After some time I noticed that disk size was getting smaller each time loop continued.When I cancelled transaction block,disk size returned to its initial state.

How can I stop loop for allocating the disk on the following transaction ?

do
$$declare
  --variables
begin
     FOR r IN (SELECT ...) LOOP
        CREATE TEMPORARY TABLE TEST .. ;
        --UPDATE STATEMENTS
        --EXPORT (COPIES TABLE INTO TXT FILE)
        DROP TABLE TEST CASCADE;
     END LOOP;
end$$;

↧

Preserve order of array elements after join

January 11, 2016, 9:00 pm

≫ Next: Recovery from Live to a new Slave Server – PostgreSQL – ERROR

≪ Previous: Postgresql Disk Allocation in Begin End Transaction

I have a query that returns a CTE looking like

+-----------+-------------+
|   node_id | ancestors   |
|-----------+-------------|
|         1 | []          |
|         2 | []          |
|         3 | [1]         |
|         4 | [2]         |
|         5 | [4, 2]      |
+-----------+-------------+

What I want to do is join with the nodes table and replace the ids that are in the ancestors column with another column on the nodes table. Here’s my query so far:

WITH RECURSIVE tree AS (
  -- snip --
)
SELECT node.entity_id AS id,
       array_remove(array_agg(parent_nodes.entity_id), NULL) AS ancestors
FROM tree
JOIN entity.nodes AS node ON node.id = tree.node_id
LEFT OUTER JOIN entity.nodes AS parent_nodes ON parent_nodes.id = ANY(tree.ancestors)
GROUP BY node.id;

The problem with that query is that is loses the order of the original ancestors array. Is there a way to perform the join while keeping the original order during the array_agg function?

↧