During btrfs balance "enospc" errors are reported
This document (000019789) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11
Situation
BTRFS info (device sda1): 1 enospc errors during balance
BTRFS info (device sda1): balance: ended with status: -28
NOTE: This is not the same ENOSPC as if the free space is exhausted.
Resolution
A filter that does not require workspace is usage=0. This will scan through all unused block groups of a given type and will reclaim the space. After that it might be possible to run other filters. So running the following could solve the issue:
# btrfs balance start -dusage=0 -musage=0 <affected device>
If it does not then one thing to consider is to remove unused snapshots when snapshotting is being used:
# snapper ls
Should show snapshots and how much they are using. Consider whether they are necessary or useful and then use snapper commands to remove the ones that aren't. See the snapper man pages for more on how to use snapper effectively.
Deleting any large unnecessary files can also help. After the snapshots and large files are deleted, try the btrfs balance command from the beginning of this guide:
# btrfs balance start -dusage=0 -musage=0 <affected device>
If this succeeds then try:
# btrfs balance start -dusage=5 -musage=5 <affected device>
Or a similarly low filter value and working your way up. Since these options only shift chunks with low filesystem usage, single digit filter values these balancing operations should not put a large strain on I/O.
Finally expanding the filesystem can also create enough space for the balance to terminate successfully.
The following link is to a SUSE knowledgebase article that shows how a btrfs root filesystem can be expanded:
TID 000018798 - How to resize/extend a btrfs formatted root partition
Make sure the linked article applies to your particular situation. For systems to which this article does not apply use SUSE and storage vendor documentation and stick to filesystem expansion best practices.
Cause
This issue is known to occur on btrfs filesystems. Particularly ones with relatively high filesystem usage (>80%)
Low working space for balance
The way balance operates, it usually needs to temporarily create a new block group and move the old data there, before the old block group can be removed. For that it needs the work space, otherwise it fails for ENOSPC reasons. This is not the same ENOSPC as if the free space is exhausted. This refers to the space on the level of block groups, which are bigger parts of the filesystem that contain many file extents.
The free work space can be shown from the output of the btrfs filesystem usage command:
Overall:
Device size: 202.55GiB
Device allocated: 15.05GiB
Device unallocated: 187.50GiB <--- unallocated is the relevant figure
Device missing: 0.00B
Used: 12.94GiB
Free (estimated): 188.81GiB (min: 188.81GiB)
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 27.31MiB (used: 0.00B)
Since the working space should be contiguous, paradoxical situations can arise, such as there seemingly being more than enough space based on usage filters alone. (i.e. the balance failing at 87% FS usage even when using a -dusage=10 filter) This is more likely to happen on a filesystem where many files have been changed, such as after a system upgrade.
Conversions on multiple devices
Conversion to profiles based on striping (RAID0, RAID5/6) require the work space on each device. An interrupted balance may leave partially filled block groups that consume the work space.
The skip_balance mount option can help work around this issue. This mount option doesn't skip balancing, rather it skips automatic resume of an interrupted balance operation. The operation can later be resumed with btrfs balance resume, or the paused state can be removed with btrfs balance cancel. The default behaviour is to resume an interrupted balance immediately after a volume is mounted, which, as mentioned, can cause issues for striped volumes.
Note that RAID5/6 is considered experimental in btrfs as of the writing of this article.
Additional Information
Filesystem balancing should be run regularly via cron scripts, upon startup and can be run manually from the command line.
There are other ways to get balancing to work when it has a low working space, including temporarily adding a device to the filesystem. More information on that method can be found here in this blog article:
http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html
The following are pages from the btrfs project's wiki. The first link is of particular interest:
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance#ENOSPC
https://btrfs.wiki.kernel.org/index.php/Balance_Filters
https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F
btrfs manpages contain a wealth of information on this and other filesystem topics. Type:
man btrfs
Upon pressing the 'Tab' key a "tabcomplete" will show several topics to chose from, including
man btrfs-balance
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019789
- Creation Date: 18-Nov-2020
- Modified Date:27-Jan-2021
-
- SUSE Linux Enterprise Desktop
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com