Quantcast
Channel: Question and Answer » postgresql
Viewing all articles
Browse latest Browse all 1138

Accurate PG database size for comparing representations

$
0
0

I have a Java/JDBC program that takes a sample file and imports the data into the database, shredding it across multiple relations. The program does this multiple times for several different representations, one of which uses Large Objects. I can share more details of these representations but they are quite long and aren’t relevant to this question since I’m looking for something generic.

I would like to compare the sizes of these different representations by examining the size of the database after each import. The database is on a PostgreSQL 9.4 local Windows server instance, with no other users and default configuration. Its only purpose is to conduct this test.

My initial plan was as follows:

for each representation {
  call VACUUM ANALYZE
  record old DB size with SELECT pg_tablespace_size('pg_default');
  import data into database
  call VACUUM ANALYZE
  record new DB size with SELECT pg_tablespace_size('pg_default');
  store storage cost as new DB size - old DB size
}

Obviously there are limitations to this approach, but my expectation is that for large files (~100MB) the reported storage costs should be reasonable approximations. Note that I use pg_tablespace_size in order to include the contribution of data outside of the main schema, such as large objects (in pg_catalog.pg_largeobject and pg_catalog.pg_largeobject_metadata).

I’m wondering whether this is a correct approach, and whether there is a better approach. I’m unsure whether VACUUM ANALYZE properly updates the stats used by pg_tablespace_size, even though it is called in the same session. It would also be better if I could avoid calling VACUUM ANALYZE, since this requires connecting as the superuser in order to run on the pg_catalog relations.

Any thoughts?


Viewing all articles
Browse latest Browse all 1138

Trending Articles