I have a Java/JDBC program that takes a sample file and imports the data into the database, shredding it across multiple relations. The program does this multiple times for several different representations, one of which uses Large Objects. I can share more details of these representations but they are quite long and aren’t relevant to this question since I’m looking for something generic.
I would like to compare the sizes of these different representations by examining the size of the database after each import. The database is on a PostgreSQL 9.4 local Windows server instance, with no other users and default configuration. Its only purpose is to conduct this test.
My initial plan was as follows:
for each representation {
call VACUUM ANALYZE
record old DB size with SELECT pg_tablespace_size('pg_default');
import data into database
call VACUUM ANALYZE
record new DB size with SELECT pg_tablespace_size('pg_default');
store storage cost as new DB size - old DB size
}
Obviously there are limitations to this approach, but my expectation is that for large files (~100MB) the reported storage costs should be reasonable approximations. Note that I use pg_tablespace_size
in order to include the contribution of data outside of the main schema, such as large objects (in pg_catalog.pg_largeobject
and pg_catalog.pg_largeobject_metadata
).
I’m wondering whether this is a correct approach, and whether there is a better approach. I’m unsure whether VACUUM ANALYZE
properly updates the stats used by pg_tablespace_size
, even though it is called in the same session. It would also be better if I could avoid calling VACUUM ANALYZE
, since this requires connecting as the superuser in order to run on the pg_catalog
relations.
Any thoughts?