One of my clients were reporting very slow behavior on adding data to the application. After jumping into my vehicle, deflecting incoming traffic, I saw that the situation was critical indeed.
The puzzle began with a piece of information revealed in the pg_stat_activity table, several queries were queuing up for no reason. Locks were held. Screams were yelled.
This particular client runs a mission critical system on a straighforward Apache/PHP/PostgreSQL 9.0 stack, and this system merrily did it's work until a recent clone and restore using Clonezilla. All was well when after system boot and several tests, until a few days later...
Symptoms for the problem were:
- Simple queries taking forever to complete (some of them 5 hours).
- Transactional locks were held due to these queries.
- Piling up of new queries.
To make a long battle opera short, the eventual desparate solution was to VACUUM the tables that caused the delay.
The dust eventually settled, and all was well.
My recommendation for anybody doing a bare-metal clone and restore of a system running a database, always perform a VACUUM/ANALYZE or equivalent on the database after a full system restore. I thus theorise that clone/restore does not place the data files in exactly the same position.