Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Concurrent SELECT and UPDATE in multiprocessing python

$
0
0

I have to randomly match about 20 million rows in two tables without using the same row twice using PostgreSQL 9.3.5. Both tables have the following structure

CREATE TABLE de_sim_points_end
(
-- Inherited from table de_sim_points:  id integer NOT NULL DEFAULT nextval('de_sim_points_id_seq'::regclass),
-- Inherited from table de_sim_points:  parent_geometry character varying(12),
-- Inherited from table de_sim_points:  used boolean DEFAULT false,
-- Inherited from table de_sim_points:  geom geometry(Point,900913),
  CONSTRAINT de_sim_points_end_pkey PRIMARY KEY (id)
)
INHERITS (de_sim_points)
WITH (
  FILLFACTOR=50, 
  OIDS=FALSE
);
ALTER TABLE de_sim_points_end
  OWNER TO benjamin;

CREATE INDEX de_sim_points_end_geom_idx
  ON de_sim_points_end
  USING gist
  (geom)
  WITH (FILLFACTOR=100);
ALTER TABLE de_sim_points_end CLUSTER ON de_sim_points_end_geom_idx;

CREATE INDEX de_sim_points_end_parent_relation_idx
  ON de_sim_points_end
  USING btree
  (parent_geometry COLLATE pg_catalog."default")
  WITH (FILLFACTOR=100);

CREATE INDEX de_sim_points_end_used_idx
  ON de_sim_points_end
  USING btree
  (used)
  WITH (FILLFACTOR=50);

The SQL Query and Python Code used to match the rows can be found at https://github.com/boerngen-schmidt/commuter-simulation/blob/master/code/builder/process_point_mass_matcher.py (I will not post it here due to its length). But basically what I do is:

  1. SELECT rows, randomly ordered from start point table
  2. JOIN rows, randomly ordered from destination point table
  3. Take the result, iterate over it and update the points in start and destination table as used
  4. Save the result in another table (done by another thread)

When I was using multiple processes with SELECT … FOR UPDATE I ran into deadlocks (currently I just use one process, so the FOR UPDATE statement is currently omitted). So far I have read a bit about SERIALIZABLE and so forth locking mechanisms, but yet I fail to fully grasp them.

Is it possible to make the matching multiprocessing compatible? How whould the statements look like?


Viewing all articles
Browse latest Browse all 1138

Trending Articles