Comments on: How to free 15GB of disk space in a tenth of a second http://www.xaprb.com/blog/2012/09/14/how-to-free-15gb-of-disk-space-in-a-tenth-of-a-second/ Stay curious! Thu, 02 May 2013 12:36:53 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Raghavendra Prabhu http://www.xaprb.com/blog/2012/09/14/how-to-free-15gb-of-disk-space-in-a-tenth-of-a-second/#comment-20267 Raghavendra Prabhu Wed, 19 Sep 2012 22:47:52 +0000 http://www.xaprb.com/blog/?p=2855#comment-20267 Quick update over what I mentioned earlier.

Seems it is possible that an ENOSPC can result, there is work in progress to fix that — http://oss.sgi.com/archives/xfs/2012-09/msg00179.html

]]>
By: Raghavendra Prabhu http://www.xaprb.com/blog/2012/09/14/how-to-free-15gb-of-disk-space-in-a-tenth-of-a-second/#comment-20266 Raghavendra Prabhu Wed, 19 Sep 2012 22:28:18 +0000 http://www.xaprb.com/blog/?p=2855#comment-20266 Yes, the ENOSPC angle exists.

However, few things:

1. The pre-allocated space won’t exceed the “ondisk” file size.

alloc_blocks = XFS_B_TO_FSB(mp, XFS_ISIZE(ip)) + 1;
alloc_blocks = XFS_FILEOFF_MIN(MAXEXTLEN,
rounddown_pow_of_two(alloc_blocks));

In the commit, reference to 8 GB is made because the file there is larger
than MAXEXTLEN which is 8GB.

So, the preallocated space for a 4G file cannot exceed 4G.

Regarding the “ondisk”, it is so because XFS does a lot of delayed
allocation, so it is the size of inode
(reported by VFS) on the disk (after the last flush).

2. As the commit details, xfs_iomap_write_delay also checks for ENOSPC and if
there is one, it disables preallocation and it flushes all the inodes (including
the previously preallocated ones) to free up the preallocated space.

3. The reason tools like ls, stat don’t reveal it is because the space may not
yet be allocated on disk and these tools check that.

Now, when you say ENOSPC, was it tending towards it (as reported by df) or was
an actual ENOSPC returned by any of the system calls? If it is latter, there
may be a bug in how it is freed up / when it is freed up.

]]>
By: Xaprb http://www.xaprb.com/blog/2012/09/14/how-to-free-15gb-of-disk-space-in-a-tenth-of-a-second/#comment-20262 Xaprb Wed, 19 Sep 2012 18:25:38 +0000 http://www.xaprb.com/blog/?p=2855#comment-20262 What Jeremy said ^^^. In my case the filesystem is fairly small, and until I figured this out, it was pretty puzzling how it could be “full” and causing the DB server to crash when it was only “half full”.

]]>
By: Jeremy Cole http://www.xaprb.com/blog/2012/09/14/how-to-free-15gb-of-disk-space-in-a-tenth-of-a-second/#comment-20260 Jeremy Cole Wed, 19 Sep 2012 18:13:10 +0000 http://www.xaprb.com/blog/?p=2855#comment-20260 Raghavendra,

The ENOSPC angle is more subtle than that. Say, for example that a system normally handles 250 x 4GB MyISAM tables, for a total of 1.0TB, on a system with 1.5TB available. When each file is created (empty) with the new xfs dynamic allocation scheme, each of those files reserves 8GB, and is kept open continuously (since table_cache is large enough). The system will only manage to create about 178 of these tables with 8GB preallocated before the system drops below 5% free space and starts reducing the preallocation.

This will mean that the first 178 files will be able to grow to 4GB (while consuming 8GB each) but the remaining 72 of the 250 files will never be able to grow to even the nominal 4GB they should be, as they have pre-allocated at most 2GB and potentially much less.

So the file system is now “full” but may contain almost no data. Even worse, ls, stat, etc. do not show where the space went.

Regards,

Jeremy

]]>
By: Raghavendra Prabhu http://www.xaprb.com/blog/2012/09/14/how-to-free-15gb-of-disk-space-in-a-tenth-of-a-second/#comment-20259 Raghavendra Prabhu Wed, 19 Sep 2012 12:24:04 +0000 http://www.xaprb.com/blog/?p=2855#comment-20259 Looks like in that case allocsize is set explicitly to 128M but in the default case it is indeed 64k. It can be decreased to upto 4k though.

But dynamic speculative pre-allocation is turned off when allocsize is set.

However, it is quite easy to free up the ‘allocated’ space — dropping filesystem cache (/proc/sys/vm/drop_caches) should fix it since there is no actual I/O done in this case.

Also, another way to free it is to close / open the file again which is what FLUSH TABLES did in this case.

The manual equivalent of preallocation is a fallocate (and is required if managed explicitly with O_DIRECT).

Now, this is beneficial for append only workloads since it can reduce fragmentation and works only with non Direct-IO (which explains why InnoDB is not affected if innodb_flush_method was set to O_DIRECT).

The ENOSPC angle to this is interesting indeed. But there is also the other angle of filesystems not performing well close to ENOSPC or at ENOSPC (there are a couple of tests where some filesystems fail / used to fail at this point).

]]>