Recovering a Broken Proxmox LXC After Running Out of Disk Space Link to heading
Recently I ran into a serious issue with my homelab setup where multiple LXC containers on my Proxmox server suddenly stopped starting.
The root cause turned out to be a completely full storage volume combined with large RAW disk images and duplicated data.
This post is a short overview of how I recovered the system, migrated my Calibre setup to a cleaner structure, and improved the deployment architecture.
The Initial Problem Link to heading
Several containers began failing with errors similar to:
Failed to run lxc.hook.pre-start
can't read superblock on /dev/loopXX
After checking disk usage, I discovered one of my storage directories had reached 100% utilization.
The main offender was a large Calibre LXC containing:
- duplicated Calibre libraries
- large RAW rootfs images
- Docker overlay data
- old backups and temporary copies
Immediate Recovery Link to heading
The first step was reclaiming space quickly:
- stopped all affected containers
- identified large RAW images
- moved unnecessary data off the full storage pool
- removed stale duplicate copies
- cleaned up unused LXC disks
Once enough free space was available again, the containers started working normally.
Calibre GUI Issues Link to heading
After recovering the storage, the LinuxServer Calibre GUI stopped functioning correctly.
The newer Calibre container now uses a Selkies/WebSocket-based remote GUI stack which requires additional ports and HTTPS handling.
The fix was exposing the required ports correctly:
ports:
- 8080:8080
- 8081:8081
- 8082:8082
- 8181:8181
Without the WebSocket port, the frontend would partially load but fail to function properly.
Reorganizing the Deployment Link to heading
Originally, the deployment stored everything under /root/calibre, including:
- library files
- metadata
- configs
- GUI state
- temporary caches
This made backups and recovery difficult.
I migrated everything to a cleaner structure:
/mnt/pve/StorageDirectory/docker/
├── calibre-gui/config
├── calibre-web/config
├── calibre-library
├── backups
└── compose
This separates:
- persistent library data
- application configs
- runtime state
- backups
which makes the system much easier to maintain.
Backup Improvements Link to heading
I also added:
- metadata database backups
- config backups
- periodic library snapshots
- retention cleanup
The most important file in a Calibre deployment is:
metadata.db
Protecting that file is critical.
Shrinking the LXC Root Filesystem Link to heading
Even after moving the library externally, the LXC still had a large 69GB RAW root disk.
Instead of attempting risky in-place shrinking, I used a safer workflow:
- create a vzdump backup
- restore to a temporary CT with a smaller rootfs
- verify functionality
- destroy old CT
- restore back using the original CTID
This allowed me to safely reduce the container size while keeping the same container ID.
Final Result Link to heading
The final setup is:
- cleaner
- easier to back up
- easier to recover
- less prone to storage exhaustion
- easier to scale
The biggest lesson from this incident:
separate large persistent data from container root filesystems as early as possible.
This dramatically simplifies recovery when something eventually goes wrong.
Closing Thoughts Link to heading
This incident was a good reminder that homelab infrastructure needs:
- storage monitoring
- backup planning
- clean separation of stateful data
- periodic cleanup
Even relatively small mistakes like duplicated directories or oversized RAW images can cascade into larger failures when disk space runs out.
Fortunately, with some cleanup and restructuring, the system is now much healthier and easier to maintain going forward.