Xaprb

Stay curious!

An easy way to run many tasks in parallel

with 10 comments

Domas Mituzas mentioned this recently. It’s so cool I just have to write about it. Here’s an easy command to fork off a bunch of jobs in parallel: xargs.

seq 10 20 | xargs -n 1 -P 5 sleep

This will send a sequence of numbers to xargs, which will divide it into chunks of one argument at a time and fork off 5 parallel processes to execute each. You can see it in action:

$ ps -eaf | grep sleep
baron     5830  5482  0 11:12 pts/2    00:00:00 xargs -n 1 -P 5 sleep
baron     5831  5830  0 11:12 pts/2    00:00:00 sleep 10
baron     5832  5830  0 11:12 pts/2    00:00:00 sleep 11
baron     5833  5830  0 11:12 pts/2    00:00:00 sleep 12
baron     5834  5830  0 11:12 pts/2    00:00:00 sleep 13
baron     5835  5830  0 11:12 pts/2    00:00:00 sleep 14

There are basically unlimited uses for this!

Written by Xaprb

May 1st, 2009 at 11:17 am

10 Responses to 'An easy way to run many tasks in parallel'

Subscribe to comments with RSS or TrackBack to 'An easy way to run many tasks in parallel'.

  1. That is awesome, thanks Baron!

    I am currently setting up Nagios and one of the things I wanted to test the alert for was a number of processes running. This is a very nice way to fork-bomb yourself and test the alert.

  2. One of my fav idioms is :

    find . -type f | grep $complex_regex | xargs some-command

    Which is more efficient then :

    find . -name $glob -exec some-command ‘{}’

    If the files have spaces :

    find . -print0 | xargs -0 some-command

    Leolo

    3 May 09 at 10:34 am

  3. Here’s another way:

    $ for i in `seq 10 20`;do sleep $i & done

    Richard

    7 May 09 at 11:33 am

  4. @Richard, I think you’re missing the fact that xargs will limit the number of processes it allows to run concurrently, in a process-pool style. Bump up the “20″ to “20000″ and you’ll see the difference pretty quickly ;)

    Great tip Baron! I had absolutely no idea this was possible.

    Justin Mason

    25 May 09 at 5:10 am

  5. Pádraig Brady

    26 May 09 at 5:40 am

  6. @Richard @Justin: another benefit of xargs is that it will block execution until all jobs are complete.

    e.g. a script which needs to create two very large file systems, and depends on both being completed before proceeding, is easy to make parallel with xargs, but would be much more complicated using bash ‘&’ forking.

    Paul Annesley

    1 Jul 09 at 2:13 am

  7. [...] partir d’aquest enllaç he vist com es poden llançar varies comandes en paralel, que ajuntat amb el nostre nou script piulador ens permet generar twitts amb el 0, el 1…. [...]

  8. A useful feature, but beware if you need to rely on the output from the parallel commands, as partial line output will step on each other. See the following thread for examples & workarounds:
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696

    Here’s another example & workaround:
    cat <test.sh
    #!/bin/bash
    count=`echo $* | wc -w`
    sleep `expr $count % 10` #simulate processing
    echo -n $count ” “
    for w in $*; do
    count=`echo -n $w | wc -c`
    sleep `expr $count % 10` # simulate more processing
    echo -n $count ” “
    done
    echo $*
    EOF

    cat < test.data
    testing 123
    the quick brown fox
    jumps over the lazy dog
    hello world
    foo bar baz
    EOF

    Run 1: no parallelism; process data line by line

    $ time cat test.data | xargs -L1 ./test.sh
    2 7 3 testing 123
    4 3 5 5 3 the quick brown fox
    5 5 4 3 4 3 jumps over the lazy dog
    2 5 5 hello world
    3 3 3 3 foo bar baz

    real 1m20.253s
    user 0m0.060s
    sys 0m0.180s

    Run 2: First try at parallelism; runs faster, but output isn’t usable

    $ time cat test.data | xargs -L1 -P5 ./test.sh
    2 2 3 4 5 3 3 5 7 3 5 5 3 testing 123
    5 hello world
    3 foo bar baz
    4 5 3 3 the quick brown fox
    4 3 jumps over the lazy dog

    real 0m24.096s
    user 0m0.059s
    sys 0m0.178s

    Run 3: Use sed to buffer line output, prepend each line w/ pid that processed the line

    $ time cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed “s/^/$$:/”‘ –
    88984:2 7 3 testing 123
    88987:2 5 5 hello world
    88988:3 3 3 3 foo bar baz
    88985:4 3 5 5 3 the quick brown fox
    88986:5 5 4 3 4 3 jumps over the lazy dog

    real 0m24.112s
    user 0m0.064s
    sys 0m0.192s

    Unlike the perl “parallel” or “annotate” workarounds, using sed doesn’t handle the stderr problem, but you could easily write a similar wrapper script which writes stderr to a tmp file and buffer stdout via sed.

    Conrad

    6 Dec 09 at 11:53 pm

  9. @Conrad, Good tip. Note there is a stdbuf command now in coreutils that can be used to line buffer output like:

    stdbuf -oL ./test.sh

    However that will not work for commands that don’t use stdio, where was your sed tip will

    Pádraig Brady

    7 Dec 09 at 6:02 am

  10. Parallel https://savannah.nongnu.org/projects/parallel/ fixes the problem of STDOUT and STDERR mixing from different commands. So this works fine:

    (echo foss.org.my; echo http://www.debian.org; echo http://www.freenetproject.org) | parallel traceroute

    In my personal opinion this is easier to read:

    cat test.data | parallel ./test.sh

    than this:

    cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed “s/^/$$:/”‘ -

    Parallel also deals nicely with filenames containing obscure characters (space quotes tabs parenthesis greater-than less-than and the likes) – even without -print0.

    Parallel can run no_of_cpus jobs in parallel (use -j+0).

    Parallel can keep the order of the output, so output of the second job can be postponed till the first job is done (use -k).

    Parallel has support for context replace, so you create the arguments from a template like pict{}.jpg

    Ole Tange

    27 Jan 10 at 7:45 pm

Leave a Reply