General Problem: I have a table of rasters. The rasters are classified MSI images, so each pixel is an integer indicating the class of the pixel. For a number of regions, I am querying pixel counts in those regions, i.e., making a table like:
filename | total pixel count | pixels in class 0 | pixels in class 1 | ...
Specific Problem: My problem is that the script takes a long time, so I want to get the runtime down.
What I’ve tried: Full disclosure, I’m not well versed in postgresql. Also, note that in the two code snippets below, the most inner sub-selects are identical. Here’s my first attempt:
EXPLAIN ANALYZE SELECT
filename AS filename,
ST_Count(rast,1) AS totalpixels,
(ST_ValueCount(rast,1,false,ARRAY[0.0])).count AS nodata,
(ST_ValueCount(rast,1,false,ARRAY[1.0])).count AS lowveg,
(ST_ValueCount(rast,1,false,ARRAY[2.0])).count AS highveg,
(ST_ValueCount(rast,1,false,ARRAY[12.0])).count AS clouds,
(ST_ValueCount(rast,1,false,ARRAY[13.0])).count AS shadow
FROM
(
SELECT
filename,
ST_Clip(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) AS rast
FROM
rasters
WHERE
ST_Intersects(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326))
) AS source_rasters;
This runs in 185 ms. I thought surely it’s sub-optimal to ST_ValueCount
so many times. So here’s my improved attempt — running it once and converting the SETOF result to an array so that I can index the values:
EXPLAIN ANALYZE SELECT
filename AS filename,
totalpixels AS totalpixels,
pxcnt[1] AS nodata,
pxcnt[2] AS lowveg,
pxcnt[3] AS highveg,
pxcnt[4] AS clouds,
pxcnt[5] AS shadow
FROM
(
SELECT
filename AS filename,
ST_Count(rast,1,false) AS totalpixels,
ARRAY( SELECT count FROM ST_ValueCount(rast,1,false,ARRAY[0.0,1.0,2.0,12.0,13.0]) ) AS pxcnt
FROM
(
SELECT
filename,
ST_Clip(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326)) AS rast
FROM
rasters
WHERE
ST_Intersects(rast,ST_GeomFromText('POLYGON ((125.229490000007 6.900509999999138, 125.2404900000019 6.900509999999138, 125.2404900000019 6.889510000004179, 125.229490000007 6.889510000004179, 125.229490000007 6.900509999999138))',4326))
) AS source_rasters
) AS f;
But it only reduced the runtime to 155 ms.
But then I thought maybe the ST_ValueCount is only a small portion of the total work cost, so this is all the improvement I could expect. However, if I reference just one result:
EXPLAIN ANALYZE SELECT
filename AS filename,
totalpixels AS totalpixels,
pxcnt[1] AS nodata
FROM
...
It cuts the runtime down to 55 ms, which I don’t understand because it seems to have done all the work of intersecting, clipping, counting, etc.
Question: So is there a faster way to unpack the results of ST_ValueCount, or an obvious way to speed this up in general?
Just for what it’s worth, I’ve made other incremental improvements since I began, e.g., tiling the rasters was a big improvement. At this point, this seems like the most likely opportunity for a significant improvement, please let me know if I might be wrong about that.