<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: An easy way to run many tasks in parallel</title>
	<atom:link href="http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/</link>
	<description>Stay curious!</description>
	<lastBuildDate>Thu, 09 Feb 2012 09:56:43 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: speeding up postgres onlinebackup compression &#171; itwik&#39;s Blog</title>
		<link>http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-18238</link>
		<dc:creator>speeding up postgres onlinebackup compression &#171; itwik&#39;s Blog</dc:creator>
		<pubDate>Fri, 30 Apr 2010 10:26:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1056#comment-18238</guid>
		<description>[...]    Posted 30.04.2010 Filed under: Uncategorized &#124;   Recently I stumbled over this blog entry where the benefits of xargs -P are outlined. In case you don&#8217;t know about -P yet, it allows [...]</description>
		<content:encoded><![CDATA[<p>[...]    Posted 30.04.2010 Filed under: Uncategorized |   Recently I stumbled over this blog entry where the benefits of xargs -P are outlined. In case you don&#8217;t know about -P yet, it allows [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: zillablog</title>
		<link>http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-18123</link>
		<dc:creator>zillablog</dc:creator>
		<pubDate>Sat, 10 Apr 2010 16:56:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1056#comment-18123</guid>
		<description>&lt;strong&gt;watch for momentary monitoring...&lt;/strong&gt;

One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momenta...</description>
		<content:encoded><![CDATA[<p><strong>watch for momentary monitoring&#8230;</strong></p>
<p>One of the things I preach about a lot is good monitoring of your database servers; having tools in place to tell you both what good looks like and when things go bad is critical for large scale success. But sometimes you just need to monitor a momenta&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ole Tange</title>
		<link>http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-17699</link>
		<dc:creator>Ole Tange</dc:creator>
		<pubDate>Wed, 27 Jan 2010 23:45:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1056#comment-17699</guid>
		<description>Parallel &lt;a href=&quot;https://savannah.nongnu.org/projects/parallel/&quot; rel=&quot;nofollow&quot;&gt;https://savannah.nongnu.org/projects/parallel/&lt;/a&gt; fixes the problem of STDOUT and STDERR mixing from different commands. So this works fine:

(echo foss.org.my; echo www.debian.org; echo www.freenetproject.org) &#124; parallel traceroute

In my personal opinion this is easier to read:
 
cat test.data &#124; parallel ./test.sh

than this:

cat test.data &#124; xargs -L1 -P5 sh -c ‘./test.sh $* &#124; sed &quot;s/^/$$:/&quot;‘ -


Parallel also deals nicely with filenames containing obscure characters (space quotes tabs parenthesis greater-than less-than and the likes) - even without -print0. 

Parallel can run no_of_cpus jobs in parallel (use -j+0).

Parallel can keep the order of the output, so output of the second job can be postponed till the first job is done (use -k).

Parallel has support for context replace, so you create the arguments from a template like pict{}.jpg</description>
		<content:encoded><![CDATA[<p>Parallel <a href="https://savannah.nongnu.org/projects/parallel/" rel="nofollow">https://savannah.nongnu.org/projects/parallel/</a> fixes the problem of STDOUT and STDERR mixing from different commands. So this works fine:</p>
<p>(echo foss.org.my; echo <a href="http://www.debian.org" rel="nofollow">http://www.debian.org</a>; echo <a href="http://www.freenetproject.org" rel="nofollow">http://www.freenetproject.org</a>) | parallel traceroute</p>
<p>In my personal opinion this is easier to read:</p>
<p>cat test.data | parallel ./test.sh</p>
<p>than this:</p>
<p>cat test.data | xargs -L1 -P5 sh -c ‘./test.sh $* | sed &#8220;s/^/$$:/&#8221;‘ -</p>
<p>Parallel also deals nicely with filenames containing obscure characters (space quotes tabs parenthesis greater-than less-than and the likes) &#8211; even without -print0. </p>
<p>Parallel can run no_of_cpus jobs in parallel (use -j+0).</p>
<p>Parallel can keep the order of the output, so output of the second job can be postponed till the first job is done (use -k).</p>
<p>Parallel has support for context replace, so you create the arguments from a template like pict{}.jpg</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pádraig Brady</title>
		<link>http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-17370</link>
		<dc:creator>Pádraig Brady</dc:creator>
		<pubDate>Mon, 07 Dec 2009 10:02:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1056#comment-17370</guid>
		<description>@Conrad, Good tip. Note there is a stdbuf command now in coreutils that can be used to line buffer output like:

stdbuf -oL ./test.sh

However that will not work for commands that don&#039;t use stdio, where was your sed tip will</description>
		<content:encoded><![CDATA[<p>@Conrad, Good tip. Note there is a stdbuf command now in coreutils that can be used to line buffer output like:</p>
<p>stdbuf -oL ./test.sh</p>
<p>However that will not work for commands that don&#8217;t use stdio, where was your sed tip will</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Conrad</title>
		<link>http://www.xaprb.com/blog/2009/05/01/an-easy-way-to-run-many-tasks-in-parallel/#comment-17369</link>
		<dc:creator>Conrad</dc:creator>
		<pubDate>Mon, 07 Dec 2009 03:53:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1056#comment-17369</guid>
		<description>A useful feature, but beware if you need to rely on the output from the parallel commands, as partial line output will step on each other. See the following thread for examples &amp; workarounds:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696

Here&#039;s another example &amp; workaround:
cat &lt;test.sh
#!/bin/bash
count=`echo $* &#124; wc -w`
sleep `expr $count % 10` #simulate processing
echo -n $count &quot; &quot;
for w in $*; do
  count=`echo -n $w &#124; wc -c`
  sleep `expr $count % 10` # simulate more processing
  echo -n $count &quot; &quot;
done
echo $*
EOF

cat &lt; test.data
testing 123
the quick brown fox
jumps over the lazy dog
hello world
foo bar baz
EOF

Run 1: no parallelism; process data line by line

$ time cat test.data &#124; xargs -L1 ./test.sh
2  7  3  testing 123
4  3  5  5  3  the quick brown fox
5  5  4  3  4  3  jumps over the lazy dog
2  5  5  hello world
3  3  3  3  foo bar baz

real	1m20.253s
user	0m0.060s
sys	0m0.180s

Run 2: First try at parallelism; runs faster, but output isn&#039;t usable

$ time cat test.data &#124; xargs -L1 -P5 ./test.sh 
2  2  3  4  5  3  3  5  7  3  5  5  3  testing 123
5  hello world
3  foo bar baz
4  5  3  3  the quick brown fox
4  3  jumps over the lazy dog

real	0m24.096s
user	0m0.059s
sys	0m0.178s

Run 3: Use sed to buffer line output, prepend each line w/ pid that processed the line

$ time cat test.data &#124; xargs -L1 -P5 sh -c &#039;./test.sh $* &#124; sed &quot;s/^/$$:/&quot;&#039; --
88984:2  7  3  testing 123
88987:2  5  5  hello world
88988:3  3  3  3  foo bar baz
88985:4  3  5  5  3  the quick brown fox
88986:5  5  4  3  4  3  jumps over the lazy dog

real	0m24.112s
user	0m0.064s
sys	0m0.192s

Unlike the perl &quot;parallel&quot; or &quot;annotate&quot; workarounds, using sed doesn&#039;t handle the stderr problem, but you could easily write a similar wrapper script which writes stderr to a tmp file and buffer stdout via sed.</description>
		<content:encoded><![CDATA[<p>A useful feature, but beware if you need to rely on the output from the parallel commands, as partial line output will step on each other. See the following thread for examples &amp; workarounds:<br />
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696" rel="nofollow">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518696</a></p>
<p>Here&#8217;s another example &amp; workaround:<br />
cat &lt;test.sh<br />
#!/bin/bash<br />
count=`echo $* | wc -w`<br />
sleep `expr $count % 10` #simulate processing<br />
echo -n $count &#8221; &#8220;<br />
for w in $*; do<br />
  count=`echo -n $w | wc -c`<br />
  sleep `expr $count % 10` # simulate more processing<br />
  echo -n $count &#8221; &#8220;<br />
done<br />
echo $*<br />
EOF</p>
<p>cat &lt; test.data<br />
testing 123<br />
the quick brown fox<br />
jumps over the lazy dog<br />
hello world<br />
foo bar baz<br />
EOF</p>
<p>Run 1: no parallelism; process data line by line</p>
<p>$ time cat test.data | xargs -L1 ./test.sh<br />
2  7  3  testing 123<br />
4  3  5  5  3  the quick brown fox<br />
5  5  4  3  4  3  jumps over the lazy dog<br />
2  5  5  hello world<br />
3  3  3  3  foo bar baz</p>
<p>real	1m20.253s<br />
user	0m0.060s<br />
sys	0m0.180s</p>
<p>Run 2: First try at parallelism; runs faster, but output isn&#8217;t usable</p>
<p>$ time cat test.data | xargs -L1 -P5 ./test.sh<br />
2  2  3  4  5  3  3  5  7  3  5  5  3  testing 123<br />
5  hello world<br />
3  foo bar baz<br />
4  5  3  3  the quick brown fox<br />
4  3  jumps over the lazy dog</p>
<p>real	0m24.096s<br />
user	0m0.059s<br />
sys	0m0.178s</p>
<p>Run 3: Use sed to buffer line output, prepend each line w/ pid that processed the line</p>
<p>$ time cat test.data | xargs -L1 -P5 sh -c &#8216;./test.sh $* | sed &#8220;s/^/$$:/&#8221;&#8216; &#8211;<br />
88984:2  7  3  testing 123<br />
88987:2  5  5  hello world<br />
88988:3  3  3  3  foo bar baz<br />
88985:4  3  5  5  3  the quick brown fox<br />
88986:5  5  4  3  4  3  jumps over the lazy dog</p>
<p>real	0m24.112s<br />
user	0m0.064s<br />
sys	0m0.192s</p>
<p>Unlike the perl &#8220;parallel&#8221; or &#8220;annotate&#8221; workarounds, using sed doesn&#8217;t handle the stderr problem, but you could easily write a similar wrapper script which writes stderr to a tmp file and buffer stdout via sed.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

