Quantcast
Channel: Question and Answer » postgresql
Viewing all 1138 articles
Browse latest View live

PostgreSQL Primary key disappears from test table

$
0
0

I have a somewhat convoluted scenario where a test table I had created with a primary key won’t return the primary key. pgAdmin III reports there are no constraints. I have the entire PostgreSQL query log and below have the query I used to create the test table. I then dropped the primary key on a different test table and used the query generated (it’s not a query I have manually run, yet) to help me search for pgAdmin III dropping the primary key on the table in question and found nothing searching for:

ALTER TABLE public.delete_key_bigserial DROP CONSTRAINT

The string ‘DROP CONSTRAINT’ only appears once in the query log dating back to 2014-12-02 which is weeks before I even created the test tables. I now understand that a primary key may or may not be set to bigserial or serial and have even created a table without a primary key set id to integer and then set id to be the primary key (another can of worms for whole ‘nother day).

In an earlier question I inquired about how to fetch the data_type including if it was bigserial or serial to which Erwin Brandstetter had an excellent answer. He provided two queries in particular, one to fetch the data_types for all the columns and one to fetch the data_type for the primary key. Unfortunately one of the test tables I have been testing with isn’t returning any results.

SELECT a.attrelid::regclass::text, a.attname,
CASE a.atttypid
 WHEN 'int'::regtype  THEN 'serial'
 WHEN 'int8'::regtype THEN 'bigserial'
 WHEN 'int2'::regtype THEN 'smallserial'
 END AS serial_type
FROM   pg_attribute  a
JOIN   pg_constraint c ON c.conrelid  = a.attrelid AND c.conkey[1] = a.attnum 
JOIN   pg_attrdef   ad ON ad.adrelid  = a.attrelid
                       AND ad.adnum   = a.attnum
WHERE  a.attrelid = 'delete_key_bigserial'::regclass
AND    a.attnum > 0
AND    NOT a.attisdropped
AND    a.atttypid = ANY('{int,int8,int2}'::regtype[]) -- integer type
AND    c.contype = 'p'                                -- PK
AND    array_length(c.conkey, 1) = 1                  -- single column
AND    ad.adsrc = 'nextval('''
            || (pg_get_serial_sequence (a.attrelid::regclass::text, a.attname))::regclass
            || '''::regclass)';

The query works perfect on all the other tables.

I’ve only been working with PostgreSQL since November 2014 and MySQL since circa 2011 so the best thing I can do AFAIK is to fetch as much relevant data as I can. Here is the query used to create the delete_key_bigserial table from the query log:

CREATE TABLE public.delete_key_bigserial (id bigserial PRIMARY KEY NOT NULL)
WITH (OIDS = FALSE);

I simplified Erwin’s query and used it on the table to compare the results in my query tool to different test tables that the query works perfectly fine on (on all four data_types):

SELECT * FROM pg_attribute a 
WHERE a.attrelid = 'delete_key_bigserial'::regclass
AND a.attnum > 0
AND NOT a.attisdropped
AND attname='id'
ORDER BY a.attnum;

+----------+---------+----------+---------------+--------+--------+----------+-------------+
| attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff |
+----------+---------+----------+---------------+--------+--------+----------+-------------+
| 46390    | id      | 20       | -1            | 8      | 20     | 0        | -1          |
+----------+---------+----------+---------------+--------+--------+----------+-------------+

+-----------+----------+------------+----------+------------+-----------+--------------+
| atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped |
+-----------+----------+------------+----------+------------+-----------+--------------+
| -1        | f        | p          | d        | t          | t         | f            |
+-----------+----------+------------+----------+------------+-----------+--------------+

+------------+-------------+--------------+--------+------------+---------------+
| attislocal | attinhcount | attcollation | attacl | attoptions | attfdwoptions |
+------------+-------------+--------------+--------+------------+---------------+
| t          | 0           |              |        |            |               |
+------------+-------------+--------------+--------+------------+---------------+

Erwin is deriving the type via the atttypid column when other conditions are met however the resulting column/row is identical to other tables that work. There is another catalog table I’ve used in my attempts to determine what the data_type of the primary key is so I decided to compare the results from that table as well via the following query:

SELECT * FROM information_schema.columns
WHERE table_schema NOT IN ('pg_catalog', 'information_schema')
AND table_name='delete_key_bigserial'
AND is_nullable='NO';

The only difference for any returned column/row returned (besides the table name in the table_name and column_default columns) was the dtd_identifier column. The table delete_key_bigserial returns the column dtd_identifier with the value 20, for a working table the query returns 1. The (bottom of the) PostgreSQL element_types documentation describes the column as:

An identifier of the data type descriptor of the element. This is
currently not useful.

I am guessing this is a deprecated/older fashion that is kept for legacy purposes though it could simply refer to the description itself? I’m not sure but this is where I am and I’m not even certain I’m on the right path.

I’d rather deal with the issue and learn from the scenario then disregard it simply because it’s a test table as one day I’m certain I’ll have to deal with this issue when it’s not a test table. I’ll be happy to update my question with relevant information that may help track down what the issue is.


How to implement a distributed database system [closed]

$
0
0

I have been tasked to find out how to implement a distributed database system using the various database management software available like: MS Access 2013, MySQL and PostgreSQL. How would I be able to implement distribution using a fragmented case(SELECT query from a master table in a master database to a local-database via a network) in all three DBMS’s in a physically separate location and on different Operating systems (Windows, Mac OS and Linux).. thank you in advance.

How can I drop and create postgresql views in dependency order?

$
0
0

I found the following http://stackoverflow.com/a/9712051/61249 which is good but still too much manual labor for me. I need to recreate the views in the right order how would I go about doing that?

@dbenhur talks about the following:

To be more complete, one could figure out how to query which views
depend on the table(es) you’re modifying and use that query instead f
enumerating view names. Gonna need to understand pg_rewrite | pg_rule
to do that, I think

I am unsure what that means exactly but I’ll tell you want I need and you tell me if it is possible.

I work on a rails application where I’ve tried to maintain the views and functions as part of the rails migrations unfortunately this became a real mess and I dumped our views and functions into separate files for each and one of those. There are two ways forward as I see it. I could either use a single file for these which makes it cumbersome but I guess the order of the DDL would be managed by pg_dump.

OR I could when I dump the views and functions also query and create a dependency tree that I later use for recreating them. The reason is that it has become too complex to do manually for just changing the name of a column in a view.

How do others maintain their DDL? Anyone ever done anything like what I want to do?

privilages system table in postgres

$
0
0

Is it possible to know all the tables a user has access to using a single query?

I tried with pg_role and pg_user with no luck. Can anyone tell me about a system table which has the data of all the objects that a user has access to?

Optimal way to ignore duplicate inserts?

$
0
0

Background

This problem relates to ignoring duplicate inserts using PostgreSQL 9.2 or greater. The reason I ask is because of this code:

  -- Ignores duplicates.
  INSERT INTO
    db_table (tbl_column_1, tbl_column_2)
  VALUES (
    SELECT
      unnseted_column,
      param_association
    FROM
      unnest( param_array_ids ) AS unnested_column
  );

The code is unencumbered by checks for existing values. (In this particular situation, the user does not care about errors from inserting duplicates — the insertion should “just work”.) Adding code in this situation to explicitly test for duplicates imparts complications.

Problem

In PostgreSQL, I have found a few ways to ignore duplicate inserts.

Ignore Duplicates #1

Create a transaction that catches unique constraint violations, taking no action:

  BEGIN
    INSERT INTO db_table (tbl_column) VALUES (v_tbl_column);
  EXCEPTION WHEN unique_violation THEN
    -- Ignore duplicate inserts.
  END;

Ignore Duplicates #2

Create a rule to ignore duplicates on a given table:

CREATE OR REPLACE RULE db_table_ignore_duplicate_inserts AS
    ON INSERT TO db_table
   WHERE (EXISTS ( SELECT 1
           FROM db_table
          WHERE db_table.tbl_column = NEW.tbl_column)) DO INSTEAD NOTHING;

Questions

My questions are mostly academic:

  • What method is most efficient?
  • What method is most maintainable, and why?
  • What is the standard way to ignore insert duplication errors with PostgreSQL?
  • Is there a technically more efficient way to ignore duplicate inserts; if so, what is it?

Thank you!

Are my queries to Postgresql hot spare actually being handled there?

$
0
0

I have master and minion database servers each running Arch Linux, with the latter acting as a hot spare database with replication. I wrote a service that is supposed to check whether the replication is working, which works by making SELECT queries to each and comparing the results. As part of my test to see if that service is working, I run systemctl stop postgresql.service on the hot spare and check to see whether my service alerts me that replication is failing. I created an instance of the (RoR) service in the console, which prints the results of its queries on each server. From this, I can see that the hot spare continues to have as many records as the master, so either replication is still working despite postgresql.service being stopped, or (it occurred to me) the queries that I think are taking place on the hot spare are actually being forwarded to the master or something like that. Is that possible?

Postgres Index a query with MAX and groupBy

$
0
0

Is there any way to index the following query?

SELECT run_id, MAX ( frame ) , MAX ( time ) FROM run.frames_stat GROUP BY run_id;

I’ve tried creating sorted (non-composite) indexes on frame and time, and an index on run_id, but the query planner doesn’t use them.

Misc info:

  • Unfortunately (and for reasons I won’t get into) I cannot change the query
  • The frames_stat table has 42 million rows
  • The table is unchanging (no further inserts/deletes will ever take place)
  • The query was always slow, it’s just gotten slower because this dataset is larger than in the past.
  • There are no indexes on the table
  • We are using Postgres 9.4
  • The db’s “work_mem” size is 128MB (if that’s relevant).
  • Hardware: 130GB Ram, 10 core Xeon

Schema:

CREATE TABLE run.frame_stat (
  id bigint NOT NULL,
  run_id bigint NOT NULL,
  frame bigint NOT NULL,
  heap_size bigint NOT NULL,
  "time" timestamp without time zone NOT NULL,
  CONSTRAINT frame_stat_pkey PRIMARY KEY (id)
)

Explain analyze:

HashAggregate  (cost=1086240.000..1086242.800 rows=280 width=24) (actual time=14182.426..14182.545 rows=280 loops=1)
  Group Key: run_id
  ->  Seq Scan on zulu  (cost=0.000..770880.000 rows=42048000 width=24) (actual time=0.037..4077.182 rows=42048000 loops=1)

Dropped column still has value when recreated with Postgres table of 150M rows

$
0
0

I need to set the column to NULL. Until now, this has worked, but for some reason, on this table which is much larger than the rest, it doesn’t seem to be working:

ALTER TABLE "public"."WorkoutExercises" DROP COLUMN "_etl";
ALTER TABLE "public"."WorkoutExercises" ADD COLUMN "_etl" bool;

However

SELECT
    *
FROM
    "WorkoutExercises"
WHERE
    "_etl" = TRUE
LIMIT 1000;

Returns 1000 results. Why is that, and how can this be fixed?


Pgpool II : unable to read message length between two network interfaces

$
0
0

Start log file:

2015-03-06 01:57:56: pid 2760: LOG:  Setting up socket for 0.0.0.0:9999
2015-03-06 01:57:56: pid 2760: LOG:  Setting up socket for :::9999
2015-03-06 01:57:56: pid 2760: LOG:  pgpool-II successfully started. version 3.4.
0 (tataraboshi)
2015-03-06 01:57:56: pid 2760: LOG:  find_primary_node: checking backend no 0
2015-03-06 01:57:56: pid 2760: LOG:  find_primary_node: primary node id is 0
2015-03-06 01:58:02: pid 2792: ERROR:  unable to read message length
2015-03-06 01:58:02: pid 2792: DETAIL:  message length (8) in slot 1 does not mat
ch with slot 0(12)

Debug log file:

2015-03-06 01:55:07: pid 2680: LOG:  find_primary_node: primary node id is 0
2015-03-06 01:55:42: pid 2712: DEBUG:  I am 2712 accept fd 6
2015-03-06 01:55:42: pid 2712: DEBUG:  reading startup packet
2015-03-06 01:55:42: pid 2712: DETAIL:  Protocol Major: 1234 Minor: 5679 database
:  user: 
2015-03-06 01:55:42: pid 2712: DEBUG:  selecting backend connection
2015-03-06 01:55:42: pid 2712: DETAIL:  SSLRequest from client
2015-03-06 01:55:42: pid 2712: DEBUG:  reading startup packet
2015-03-06 01:55:42: pid 2712: DETAIL:  application_name: psql
2015-03-06 01:55:42: pid 2712: DEBUG:  reading startup packet
2015-03-06 01:55:42: pid 2712: DETAIL:  Protocol Major: 3 Minor: 0 database: post
gres user: enterprisedb
2015-03-06 01:55:42: pid 2712: DEBUG:  creating new connection to backend
2015-03-06 01:55:42: pid 2712: DETAIL:  connecting 0 backend
2015-03-06 01:55:42: pid 2712: DEBUG:  creating new connection to backend
2015-03-06 01:55:42: pid 2712: DETAIL:  connecting 1 backend
2015-03-06 01:55:42: pid 2712: DEBUG:  reading message length
2015-03-06 01:55:42: pid 2712: DETAIL:  slot: 0 length: 12
2015-03-06 01:55:42: pid 2712: DEBUG:  reading message length
2015-03-06 01:55:42: pid 2712: DETAIL:  slot: 1 length: 8
2015-03-06 01:55:42: pid 2712: ERROR:  unable to read message length
2015-03-06 01:55:42: pid 2712: DETAIL:  message length (8) in slot 1 does not mat
ch with slot 0(12)

psql command:

[enterprisedb@testing_ppas93]$ psql -h localhost -p 9999 -d postgres
psql: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request

I use pgpool II to control failover event between two servers (master on google cloud, slave on my local, centos x64, database: Postgres Plus Advanced Server 9.3). When I try to connect my database via port 9999 (example: psql -h localhost -p 9999 -d postgres) , pgpool raises the error

ERROR: Unable to read message length

Please help: kerberos authentication setup on postgresql

$
0
0

Requirement: Backend working with RHEL and postgresql and front-end working with windows,.net. please consider how the authentication method must be done. our main idea will be to have the authentication done in the front-end and credentials could be passed to the backend. example: google. once we login to google.com we can access google+, google drive, google apps.. without again providing password.

Update Statement in PostGIS ERROR

$
0
0

I am trying to update a column in PostGIS using information from a column in another table where they spatially intersect. I looked up how to do this on the PostgreSQL tutorial and I think that I should have the SQL correct. However, I am getting an error message that says:

ERROR: missing FROM-clause entry for table "buildingid"

This is what I have so far:

UPDATE sr_mld_dc
SET building_id = buildingid.washington_dc_2dbd
FROM washington_dc_2dbd
WHERE ST_CONTAINS(washington_dc_2dbd.geom, sr_mld_dc.geom);

So far I have tried to change the order of the lines with no success, and the current order is how the PostgreSQL tutorial set theirs up. If anybody could point me in the right direction that would be excellent.

UPDATE a value field for area of the geom and convert sq m to acres using cartodb

$
0
0

I’m trying to use the SQL commands in CartoDB to UPDATE a value field that shows total acres of the area of the_geom. My SQL is pretty poor/entry level and has some issues:

UPDATE table_name SET shape_area=(SELECT ST_Area(the_geom::geography) area_sqm 
FROM table_name)

I’m missing something… any help?

Postgres errors with Metatag module in combination with Drag and Drop File upload

$
0
0

After having migrated from a MySQL 5.5 to a Postgres 9.3 Database I encounter some issues in a Drupal OpenOutreach installation. I have a content type with a Drag & Drop Upload field to display images. Whenever I try to save the new content, I receive the following error caused by the Metatag module:

PDOException: SQLSTATE[25P02]: In failed sql transaction: 7 ERROR: current transaction is aborted, commands ignored until end of transaction block: SELECT language, '' FROM {metatag} WHERE (entity_type = :type) AND (entity_id = :id) AND (revision_id = :revision); Array ( [:type] => file [:id] => 206 [:revision] => 0 ) in metatag_metatags_save() (line 518 of <drupal-instance>/var/www/arsleonis/profiles/openoutreach/modules/contrib/metatag/metatag.module)

It is the following function causing the error:

  // Handle scenarios where the metatags are completely empty. 
if (empty($metatags)) {
$metatags = array();
// Add an empty array record for each language.
$languages = db_query("SELECT language, ''
    FROM {metatag}
    WHERE (entity_type = :type)
    AND (entity_id = :id)
    AND (revision_id = :revision)",
  array(
    ':type'     => $entity_type,
    ':id'       => $entity_id,
    ':revision' => $revision_id,
  ))->fetchAllKeyed();
foreach ($languages as $oldlang => $empty) {
  $metatags[$oldlang] = array();
 }    
}

On MySQL everything worked out properly and so does it on a clean (new) install with a Postgres DB (chosen from the beginning).

I have already tried installing a different version of the modules, but this showed no results. Unfortunately I don’t even know where to look for what’s causing this error.

Is there a way to refresh pgAdmin's schema browser via SQL? [on hold]

$
0
0

I use pgAdmin III and every time I launch a SQL query, the database structure doesn’t refresh automatically. For instance: dropping a table doesn’t make the table disappearing from the database tree on the left of the GUI. I have to click the schema where the table was and click the ‘refresh’ button. My question: is there a SQL command I could write after my DROP TABLE... in order to do it automatically?

Search results mismatch after postgresql update

$
0
0

Today we had to update the PostgreSQL version on our database servers. Before update the version was 9.3.4 and the new one is 9.3.6. Despite the small difference in the database version and the absolutely identical configs, the search results changed drastically. The number of results decreased many times for some search queries. The rows added today (after update) were found each time, while some older ones cannot be found by a given phrase.

We use tsvector to perform full text search.

Any ideas what might be causing this?

Example query:

SELECT * FROM content, to_tsquery('bulgarian_utf8', '{keyword}') q
WHERE (tsv_title @@ q OR tsv @@ q) ) AS c

Can I setup database triggers to run on a defined secondary set of servers only? [closed]

$
0
0

This might sound a bit off, but here’s what I have been thinking for a while now:

Use Case

You want to build an activity log for each user action on your application using database (postgreSQL) triggers on every relevant table, outputting to an activity_log table.
The triggers should do the trick, but how do we eliminate the burden of every user action triggering an action on the production servers, delaying the whole application?

Purposed Architecture

What I have in mind is a complex structure where one or more secondary postgres nodes would take the entire activity_log trigger activity.
The triggers would be disabled on all primary nodes (the ones the application reads and writes to) and enabled on some/all secondary nodes (let’s call them “workers”).
Data would be written to a primary server (no trigger runs) and replication would forward it to all other nodes. When a “worker” node get’s the data, the triggers process it and update the activity_log.
activity_log table should be replicated across ALL servers which means that a “worker” node should be able to read, write and send selected data upstream.

Is there anything even close to this?
Is this even possible to achieve without having to rewrite a replication model from scratch?

Making ordinal synonyms for Postgres full text search

$
0
0

I have a Postgres 9.3 database where users are mostly searching for locations by a combination of name and/or address (or fragments of).

An example of an address would be “123 8th st”.

I’ve been able to setup a synonym dictionary that allows users to find such an address with “123 8th street”, but I can’t seem to get it to do the same for ordinals. That is, I want a search for “123 8 street” (etc) to be able to find this address.

I’m using the following code:

CREATE TEXT SEARCH CONFIGURATION my_app_english (
  COPY = english
);

CREATE TEXT SEARCH DICTIONARY my_app_synonyms (
  TEMPLATE = synonym,
  SYNONYMS = my_app_synonyms
);

ALTER TEXT SEARCH CONFIGURATION my_app_english
  ALTER MAPPING FOR asciiword
  WITH my_app_synonyms, english_stem;

And the synonym file had lines like:

1st 1
2nd 2
3rd 3
4th 4
5th 5
...

How can I make this full text search match “8th” when the user searches for “8″?

I suspect the token type is the issue, but am not sure how I can consistently treat the token with the ordinal like that without.

postgresql bdr 0.8.x – adding another downstream server only partially works

$
0
0

I have the following postgresql 9.4 bdr setup:

  • upstream server with db called “bdrdemo” running on 10.1.1.1
  • downstream server(1) with db called bdrdemo running on 10.2.2.2 (replicates with 10.1.1.1)
  • downstream server(2) with db called “newname” running on 10.3.3.3 (replicates with 10.1.1.1)

When i set up downstream server 2, i purposely used a different database name to test whether database names matter. It looks like all the data from bdrdemo running on 10.1.1.1 copied over properly, but when I make new changes from the upstream, or from the downstream2, nothing is replicated between the two.

I see an error on in the logs on the upstream server that says:

Mar 30 19:44:38 testbox postgres[2745]: [339-1] d= p=2745 a=FATAL: 3D000: database “newname” does not exist

What I’ve checked so far:

  1. I checked the bdr.bdr_nodes table and it shows 3 entries now instead of the two before i created the new downstream server.

    select * from bdr.bdr_nodes

    node_sysid      | node_timeline | node_dboid | node_status 
    ---------------------+---------------+------------+-------------
    6127254639323810674 |             1 |      16385 | r
    6127254604756301413 |             1 |      16384 | r
    6132048976759969713 |             1 |      16385 | r
    (3 rows)
    
    bdrdemo=#
    
  2. the postgresql.conf file on the upstream server has the following settings:

    #-------------------------------------------                                                                                          
    # BDR connection configuration for upstream                                         
    #-------------------------------------------                                                                   
    
    bdr.connections = 'bdrdownstream,bdrdownstream2'                                
    bdr.bdrdownstream_dsn = 'dbname=bdrdemo host=10.2.2.2 user=postgres port=5432'                              
    bdr.bdrdownstream2_dsn='dbname=newname host=10.3.3.3 user=postgres port=5432'     
    

Edit 1

Downstream server 1′s configuration (this server/node is working)

# BDR connection configuration for upstream node.                                
#-------------------------------------------                                       

bdr.connections = 'bdrupstream'                                                  

bdr.bdrupstream_dsn = 'dbname=bdrdemo host=10.1.1.1 user=postgres port=5432'  
bdr.bdrupstream_init_replica = on                                           
bdr.bdrupstream_replica_local_dsn = 'dbname=bdrdemo user=postgres port=5432'   

Downstream server 2′s configuration (this server/node is NOT working)

# BDR connection configuration for upstream node.                                
#-------------------------------------------------                              
bdr.connections = 'bdrupstream'                                                
bdr.bdrupstream_dsn = 'dbname=bdrdemo host=10.1.1.1 user=postgres port=5432'
bdr.bdrupstream_init_replica = on                                              
bdr.bdrupstream_replica_local_dsn = 'dbname=newname user=postgres port=5432'

EDIT 2

After adding the local database name to downstream 2′s confguration, I restarted the database on downstream 2. Replication was not working. So I restarted the upstream server. Still a no go.
Then I checked the logs on the downstream 2 and I see this:

d=newname p=16791 a=pg_restore NOTICE:  42710: extension "btree_gist" already exists, skipping
d=newname p=16791 a=pg_restore LOCATION:  CreateExtension, extension.c:1208
d=newname p=16791 a=pg_restore NOTICE:  42710: extension "bdr" already exists, skipping
d=newname p=16791 a=pg_restore LOCATION:  CreateExtension, extension.c:1208
d=newname p=16791 a=pg_restore NOTICE:  42710: extension "plpgsql" already exists, skipping
d=newname p=16791 a=pg_restore LOCATION:  CreateExtension, extension.c:1208
d=newname p=16791 a=pg_restore ERROR:  42P07: relation "newtable" already exists
d=newname p=16791 a=pg_restore LOCATION:  heap_create_with_catalog, heap.c:1056
d=newname p=16791 a=pg_restore STATEMENT:  CREATE TABLE newtable (
        id integer NOT NULL,
        fname character varying(60),
        lname character varying(60)
    );



pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 191; 1259 17130 TABLE newtable postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  relation "newtable" already exists
    Command was: CREATE TABLE newtable (
    id integer NOT NULL,
    fname character varying(60),
    lname character varying(60)
);



pg_restore to dbname=newname user=postgres port=5432 fallback_application_name='bdr (6132048976759969713,1,16384,): bdrupstream: init_replica restore' options='-c bdr.do_not_replicate=on -c bdr.permit_unsafe_ddl_commands=on -c bdr.skip_ddl_replication=on -c bdr.skip_ddl_locking=on' failed, aborting
d= p=16780 a=FATAL:  XX000: bdr: /usr/bin/bdr_initial_load exited with exit code 2

When i initially set up downstream2, it did copy over all the data from upstream, but it just wasn’t participating in the replication of new data / new changes. So I guess I can understand why it’s failing while trying to create objects that already exist.
But do I have to delete the data within the subscriber database and restart to get the replication working?

Storage snapshots for consistent backup of postgresql – different data and log volumes

$
0
0

We are running many Linux VM’s in a vmware/shared storage environment, each running its own instance of postgreSQL (a mix of 9.0 and 9.3). Currently, the entire VM sits on a single root partition/volume, and we’ve had great success (~8 years) using storage-based snapshots of the underlying VMFS volumes for backup/restore process (and replication to our DR site).

Due to the architecture of our storage, it would be advantageous to separate postgres WAL files to a non-cached, mostly-write volume to give us less cache churn on the storage side. With our storage (Nimble Storage), we can assign both volumes to a single protection/snapshot group, but I haven’t been able to elicit from our vendor that the snapshots will happen at EXACTLY the same time across all volumes in the protection group – it likely will, but there’s always that chance that its milliseconds apart.

To that end, we ran some experiments, all while writing data to the DB as fast as possible using pg_bench. After the experiments, we restored our snapshot’ed volumes and started the VM+postgres

  • Snapshot both data and log volumes close to simultaneously – result: DB recovered
  • Snapshot data volume first, log volume ~1 minute later – result: DB recovered
  • Snapshot log volume first, data volume ~1 minute later – result: DB recovered
  • Snapshot log volume first, data volume ~3 minutes later, after a WAL checkpoint wrote new data to datafiles: result: DB recovered

So testing seem to tell us as long as both snapshots are consistent at the volume level, and relatively close together, you get a consistent copy of the DB, based on the time of the WAL/Log volume snapshot.

My question: Is this safe? What are the corner cases we are missing in our testing, and what could go wrong?

Postgres’ doc indicates this is not safe, but testing seems to indicate its pretty robust:
http://www.postgresql.org/docs/9.1/static/backup-file.html

If your database is spread across multiple file systems, there might not be any way to obtain exactly-simultaneous frozen snapshots of all the volumes. For example, if your data files and WAL log are on different disks, or if tablespaces are on different file systems, it might not be possible to use snapshot backup because the snapshots must be simultaneous. Read your file system documentation very carefully before trusting the consistent-snapshot technique in such situations.

NOTE: Yes, we know about other options to make sure they are consistent, like putting PostgreSQL into hot backup mode or using our storage’s VMware integration to quiesce the VM’s themselves, but we are looking for a storage-only solution for speed, convenience, and zero impact to our clients.

Postgres ON DELETE Rule Not Working With WHERE Clause

$
0
0

I’m trying to implement “Soft Deletes” given the following schema:

CREATE TABLE categories(
  id serial not null primary key,
  num integer,
  "name" text,
  deleted_at timestamp default null
);

CREATE OR REPLACE RULE delete_categories AS
  ON DELETE TO categories
  WHERE old.deleted_at IS NULL
  DO INSTEAD
    UPDATE categories SET deleted_at = NOW()
    WHERE categories.id = old.id;

The expected behavior is that if I try to delete a record with a NULL deleted_at value, it will instead be set to the current timestamp. If I try to delete a record with a non-NULL deleted_at value, it will be deleted normally.

Instead, running the below sequence returns no records, instead of a record with a timestamp in the deleted_at column:

insert into categories(num,name,deleted_at) values(9999,'Test Category',null);
delete from categories;
select * from categories;

It appears as though the RULE does not get triggered at all, and the record is simply deleted, whereas if I modify the RULE by commenting out the WHERE clause, the record is updated as expected but I am barred from being able to delete it fully:

CREATE OR REPLACE RULE delete_categories AS
  ON DELETE TO categories
  -- WHERE old.deleted_at IS NULL
  DO INSTEAD
    UPDATE categories SET deleted_at = NOW()
    WHERE categories.id = old.id;

insert into categories(num,name,deleted_at) values(9999,'Test Category',null);
delete from categories;
select * from categories;

Results:

+----+------+---------------+----------------------------+
| id | num  |     name      |         deleted_at         |
+----+------+---------------+----------------------------+
|  3 | 9999 | Test Category | 2015-03-03 20:05:44.660208 |
+----+------+---------------+----------------------------+
Viewing all 1138 articles
Browse latest View live


Latest Images