Comments on: An easy way to run many tasks in parallel http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/ Stay curious! Fri, 10 May 2013 18:25:19 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: speeding up postgres onlinebackup compression « itwik's Blog http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-18238 speeding up postgres onlinebackup compression « itwik's Blog Fri, 30 Apr 2010 10:26:12 +0000 http://www.xaprb.com/blog/?p=1056#comment-18238 [...] Posted 30.04.2010 Filed under: Uncategorized | Recently I stumbled over this blog entry where the benefits of xargs -P are outlined. In case you don’t know about -P yet, it allows [...]

]]>
By: zillablog http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-18123 zillablog Sat, 10 Apr 2010 16:56:33 +0000 http://www.xaprb.com/blog/?p=1056#comment-18123 watch for momentary monitoring…

One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momenta…

]]>
By: Ole Tange http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-17699 Ole Tange Wed, 27 Jan 2010 23:45:40 +0000 http://www.xaprb.com/blog/?p=1056#comment-17699 Parallel https://savannah.nongnu.org/projects/parallel/ fixes the problem of STDOUT and STDERR mixing from different commands. So this works fine:

(echo foss.org.my; echo http://www.debian.org; echo http://www.freenetproject.org) | parallel traceroute

In my personal opinion this is easier to read:

cat test.data | parallel ./test.sh

than this:

cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed “s/^/$$:/”‘ -

Parallel also deals nicely with filenames containing obscure characters (space quotes tabs parenthesis greater-than less-than and the likes) – even without -print0.

Parallel can run no_of_cpus jobs in parallel (use -j+0).

Parallel can keep the order of the output, so output of the second job can be postponed till the first job is done (use -k).

Parallel has support for context replace, so you create the arguments from a template like pict{}.jpg

]]>
By: Pádraig Brady http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-17370 Pádraig Brady Mon, 07 Dec 2009 10:02:06 +0000 http://www.xaprb.com/blog/?p=1056#comment-17370 @Conrad, Good tip. Note there is a stdbuf command now in coreutils that can be used to line buffer output like:

stdbuf -oL ./test.sh

However that will not work for commands that don’t use stdio, where was your sed tip will

]]>
By: Conrad http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-17369 Conrad Mon, 07 Dec 2009 03:53:29 +0000 http://www.xaprb.com/blog/?p=1056#comment-17369 A useful feature, but beware if you need to rely on the output from the parallel commands, as partial line output will step on each other. See the following thread for examples & workarounds:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696

Here’s another example & workaround:
cat <test.sh
#!/bin/bash
count=`echo $* | wc -w`
sleep `expr $count % 10` #simulate processing
echo -n $count ” “
for w in $*; do
count=`echo -n $w | wc -c`
sleep `expr $count % 10` # simulate more processing
echo -n $count ” “
done
echo $*
EOF

cat < test.data
testing 123
the quick brown fox
jumps over the lazy dog
hello world
foo bar baz
EOF

Run 1: no parallelism; process data line by line

$ time cat test.data | xargs -L1 ./test.sh
2 7 3 testing 123
4 3 5 5 3 the quick brown fox
5 5 4 3 4 3 jumps over the lazy dog
2 5 5 hello world
3 3 3 3 foo bar baz

real 1m20.253s
user 0m0.060s
sys 0m0.180s

Run 2: First try at parallelism; runs faster, but output isn’t usable

$ time cat test.data | xargs -L1 -P5 ./test.sh
2 2 3 4 5 3 3 5 7 3 5 5 3 testing 123
5 hello world
3 foo bar baz
4 5 3 3 the quick brown fox
4 3 jumps over the lazy dog

real 0m24.096s
user 0m0.059s
sys 0m0.178s

Run 3: Use sed to buffer line output, prepend each line w/ pid that processed the line

$ time cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed “s/^/$$:/”‘ –
88984:2 7 3 testing 123
88987:2 5 5 hello world
88988:3 3 3 3 foo bar baz
88985:4 3 5 5 3 the quick brown fox
88986:5 5 4 3 4 3 jumps over the lazy dog

real 0m24.112s
user 0m0.064s
sys 0m0.192s

Unlike the perl “parallel” or “annotate” workarounds, using sed doesn’t handle the stderr problem, but you could easily write a similar wrapper script which writes stderr to a tmp file and buffer stdout via sed.

]]>