An easy way to run many tasks in parallel
Domas Mituzas mentioned this recently. It’s so cool I just have to write about it. Here’s an easy command to fork off a bunch of jobs in parallel: xargs.
seq 10 20 | xargs -n 1 -P 5 sleep
This will send a sequence of numbers to xargs, which will divide it into chunks of one argument at a time and fork off 5 parallel processes to execute each. You can see it in action:
$ ps -eaf | grep sleep baron 5830 5482 0 11:12 pts/2 00:00:00 xargs -n 1 -P 5 sleep baron 5831 5830 0 11:12 pts/2 00:00:00 sleep 10 baron 5832 5830 0 11:12 pts/2 00:00:00 sleep 11 baron 5833 5830 0 11:12 pts/2 00:00:00 sleep 12 baron 5834 5830 0 11:12 pts/2 00:00:00 sleep 13 baron 5835 5830 0 11:12 pts/2 00:00:00 sleep 14
There are basically unlimited uses for this!



That is awesome, thanks Baron!
I am currently setting up Nagios and one of the things I wanted to test the alert for was a number of processes running. This is a very nice way to fork-bomb yourself and test the alert.
Peter Sankauskas
1 May 09 at 12:18 pm
One of my fav idioms is :
find . -type f | grep $complex_regex | xargs some-command
Which is more efficient then :
find . -name $glob -exec some-command ‘{}’
If the files have spaces :
find . -print0 | xargs -0 some-command
Leolo
3 May 09 at 10:34 am
Here’s another way:
$ for i in `seq 10 20`;do sleep $i & done
Richard
7 May 09 at 11:33 am
@Richard, I think you’re missing the fact that xargs will limit the number of processes it allows to run concurrently, in a process-pool style. Bump up the “20″ to “20000″ and you’ll see the difference pretty quickly ;)
Great tip Baron! I had absolutely no idea this was possible.
Justin Mason
25 May 09 at 5:10 am
http://www.spinellis.gr/blog/20090304/
Pádraig Brady
26 May 09 at 5:40 am
@Richard @Justin: another benefit of xargs is that it will block execution until all jobs are complete.
e.g. a script which needs to create two very large file systems, and depends on both being completed before proceeding, is easy to make parallel with xargs, but would be much more complicated using bash ‘&’ forking.
Paul Annesley
1 Jul 09 at 2:13 am
[...] partir d’aquest enllaç he vist com es poden llançar varies comandes en paralel, que ajuntat amb el nostre nou script piulador ens permet generar twitts amb el 0, el 1…. [...]
Twittejar des de la consola d’ordres at lliurealbir
10 Sep 09 at 5:07 pm
A useful feature, but beware if you need to rely on the output from the parallel commands, as partial line output will step on each other. See the following thread for examples & workarounds:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696
Here’s another example & workaround:
cat <test.sh
#!/bin/bash
count=`echo $* | wc -w`
sleep `expr $count % 10` #simulate processing
echo -n $count ” “
for w in $*; do
count=`echo -n $w | wc -c`
sleep `expr $count % 10` # simulate more processing
echo -n $count ” “
done
echo $*
EOF
cat < test.data
testing 123
the quick brown fox
jumps over the lazy dog
hello world
foo bar baz
EOF
Run 1: no parallelism; process data line by line
$ time cat test.data | xargs -L1 ./test.sh
2 7 3 testing 123
4 3 5 5 3 the quick brown fox
5 5 4 3 4 3 jumps over the lazy dog
2 5 5 hello world
3 3 3 3 foo bar baz
real 1m20.253s
user 0m0.060s
sys 0m0.180s
Run 2: First try at parallelism; runs faster, but output isn’t usable
$ time cat test.data | xargs -L1 -P5 ./test.sh
2 2 3 4 5 3 3 5 7 3 5 5 3 testing 123
5 hello world
3 foo bar baz
4 5 3 3 the quick brown fox
4 3 jumps over the lazy dog
real 0m24.096s
user 0m0.059s
sys 0m0.178s
Run 3: Use sed to buffer line output, prepend each line w/ pid that processed the line
$ time cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed “s/^/$$:/”‘ –
88984:2 7 3 testing 123
88987:2 5 5 hello world
88988:3 3 3 3 foo bar baz
88985:4 3 5 5 3 the quick brown fox
88986:5 5 4 3 4 3 jumps over the lazy dog
real 0m24.112s
user 0m0.064s
sys 0m0.192s
Unlike the perl “parallel” or “annotate” workarounds, using sed doesn’t handle the stderr problem, but you could easily write a similar wrapper script which writes stderr to a tmp file and buffer stdout via sed.
Conrad
6 Dec 09 at 11:53 pm
@Conrad, Good tip. Note there is a stdbuf command now in coreutils that can be used to line buffer output like:
stdbuf -oL ./test.sh
However that will not work for commands that don’t use stdio, where was your sed tip will
Pádraig Brady
7 Dec 09 at 6:02 am
Parallel https://savannah.nongnu.org/projects/parallel/ fixes the problem of STDOUT and STDERR mixing from different commands. So this works fine:
(echo foss.org.my; echo http://www.debian.org; echo http://www.freenetproject.org) | parallel traceroute
In my personal opinion this is easier to read:
cat test.data | parallel ./test.sh
than this:
cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed “s/^/$$:/”‘ -
Parallel also deals nicely with filenames containing obscure characters (space quotes tabs parenthesis greater-than less-than and the likes) – even without -print0.
Parallel can run no_of_cpus jobs in parallel (use -j+0).
Parallel can keep the order of the output, so output of the second job can be postponed till the first job is done (use -k).
Parallel has support for context replace, so you create the arguments from a template like pict{}.jpg
Ole Tange
27 Jan 10 at 7:45 pm
watch for momentary monitoring…
One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momenta…
zillablog
10 Apr 10 at 12:56 pm
[...] Posted 30.04.2010 Filed under: Uncategorized | Recently I stumbled over this blog entry where the benefits of xargs -P are outlined. In case you don’t know about -P yet, it allows [...]
speeding up postgres onlinebackup compression « itwik's Blog
30 Apr 10 at 6:26 am