Does it happen to you that you cp a big, big file (say, similar in order of magnitude to the amount of RAM) and the system becomes rather unusable?

It looks like Linux is saying "let's cache this", and as you copy it will try to free more and more ram in order to cache the big file you're copying. In the end, all the RAM is full with file data that you are not going to need.

This varies according to how /proc/sys/vm/swappiness is set.

I learnt about posix_fadvise and I tried to play with it. The result is this preloadable library that hooks into open(2) and fadvises everything as POSIX_FADV_DONTNEED.

It is all rather awkward. fadvise in that way will discard existing cache pages if the file is already cached, which is too much. Ideally one would like to say "don't cache this because of me" without stepping on the feet of other system activities.

Also, I found I need to also hook into write(2) and run fadvise after every single write, because you can't fadvise a file to be written in its entirety, unless you pass fadvise the file size in advance. But the size of the output file cannot be known by the preloaded library, so meh.

So, now I can run: nocache cp bigfile someplace/ without trashing the existing caches. I can also run nocache tar zxf foo.tar.gz and so on. I wish, of course, that there were no need to do so in the first place.

Here is the nocache library source code, for reference:

/*
 * nocache - LD_PRELOAD library to fadvise written files to not be cached
 *
 * Copyright (C) 2009--2010 Enrico Zini <enrico@enricozini.org>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

#define _XOPEN_SOURCE 600
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <dlfcn.h>
#include <stdarg.h>
#include <errno.h>
#include <stdio.h>

typedef int (*open_t)(const char*, int, ...);
typedef int (*write_t)(int fd, const void *buf, size_t count);

int open(const char *pathname, int flags, ...)
{
    static open_t func = 0;
    int res;
    if (!func)
        func = (open_t)dlsym(RTLD_NEXT, "open");

    // Note: I wanted to add O_DIRECT, but it imposes restriction on buffer
    // alignment
    if (flags & O_CREAT)
    {
        va_list ap;
        va_start(ap, flags);
        mode_t mode = va_arg(ap, mode_t);
        res = func(pathname, flags, mode);
        va_end(ap);
    } else
        res = func(pathname, flags);

    if (res >= 0)
    {
        int saved_errno = errno;
        int z = posix_fadvise(res, 0, 0, POSIX_FADV_DONTNEED);
        if (z != 0) fprintf(stderr, "Cannot fadvise on %s: %m\n", pathname);
        errno = saved_errno;
    }

    return res;
}

int write(int fd, const void *buf, size_t count)
{
    static write_t func = 0;
    int res;
    if (!func)
        func = (write_t)dlsym(RTLD_NEXT, "write");

    res = func(fd, buf, count);

    if (res > 0)
    {
        int saved_errno = errno;
        int z = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
        if (z != 0) fprintf(stderr, "Cannot fadvise during write: %m\n");
        errno = saved_errno;
    }

    return res;
}

Updates

Steve Schnepp writes:

Robert Love did a O_STREAMING patch for 2.4. It wasn't merged in 2.6 since POSIX_FADV_NOREUSE should be used instead.

But unfortunatly it's currently mapped either as WILLNEED or as a noop.

It seems that there is a google code project that has spawned to control this.

eng pdo sw tips

2010-03-08 13:26:28+01:00