<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>Werx Limited &#187; parallel processing</title> <atom:link href="http://werxltd.com/wp/tag/parallel-processing/feed/" rel="self" type="application/rss+xml" /><link>http://werxltd.com/wp</link> <description>We make IT work.</description> <lastBuildDate>Mon, 07 May 2012 18:40:10 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.2</generator> <item><title>Process forking and threading with PHP</title><link>http://werxltd.com/wp/2010/08/23/process-forking-with-php/</link> <comments>http://werxltd.com/wp/2010/08/23/process-forking-with-php/#comments</comments> <pubDate>Mon, 23 Aug 2010 12:00:49 +0000</pubDate> <dc:creator>wes</dc:creator> <category><![CDATA[php]]></category> <category><![CDATA[software development]]></category> <category><![CDATA[distributed processing]]></category> <category><![CDATA[parallel processing]]></category> <category><![CDATA[php threads]]></category> <category><![CDATA[pid]]></category> <category><![CDATA[process id]]></category> <category><![CDATA[serializing]]></category> <category><![CDATA[shared memory]]></category> <category><![CDATA[threading]]></category><guid
isPermaLink="false">http://werxltd.com/wp/?p=663</guid> <description><![CDATA[I&#8217;ve been working on a rather large web application which is responsible for combining data from a variety of sources and presenting the data to the end user in a clean, unified fashion. During this process we sometimes run into cases where multiple related calls are made, each to perform some transformative work on a [...]]]></description> <content:encoded><![CDATA[<p>I&#8217;ve been working on a rather large web application which is responsible for combining data from a variety of sources and presenting the data to the end user in a clean, unified fashion. During this process we sometimes run into cases where multiple related calls are made, each to perform some transformative work on a single set of data. We decided these calls could be made in a more parallel fashion and as such started looking into ways of parallelizing PHP so that relatively expensive operations could be performed at the same time and then the results combined in the end.</p><p>We examined a few possible solutions such as <a
href="http://gearman.org/">Gearman</a>, <a
href="http://webforumz.com/php/12595-multithreaded-php.htm">popen</a>, and <a
href="http://www.developertutorials.com/tutorials/php/parallel-web-scraping-in-php-curl-multi-functions-375/">multi curl</a>. However all of these methods seemed to require more overhead than they were worth. What I really wanted to see was something more along the lines of <a
href="http://en.wikipedia.org/wiki/POSIX_Threads">POSIX threads</a> to distribute the work load and <a
href="http://en.wikipedia.org/wiki/Shared_memory">shared memory</a> for passing data between the parent and child threads.</p><p>After some searching through PHP extensions and the official documentation I ran across <a
href="http://us3.php.net/manual/en/refs.fileprocess.process.php">PHP&#8217;s Process Control Extensions</a> suite which contains <a
href="http://us3.php.net/manual/en/ref.pcntl.php">PCNTL functions</a>, one of which is <a
href="http://us3.php.net/manual/en/function.pcntl-fork.php">pcntl_fork</a>. Combined with PHP&#8217;s <a
href="http://us3.php.net/manual/en/ref.shmop.php">Shared Memory Functions</a>, this promises to fit the bill of inexpensive distribution of processing tasks along with low-overhead <a
href="http://en.wikipedia.org/wiki/Inter-process_communication">inter process communication</a>.</p><p>Here is a sample proof-of-concept script. I&#8217;ll outline what it does below:</p><pre class="brush:php">$data = array();

echo "Parent PID: ".getmypid().PHP_EOL;

function forkTest(array &amp;$data) {
	$pids = array();

	$parent_pid = getmypid();

	for($i = 0; $i &lt; 10; $i++) {
		if(getmypid() == $parent_pid) {
			$pids[] = pcntl_fork();
			echo "Forking child, \$pids now has ".count($pids)." elements".PHP_EOL;
		}
	} 	

	if (getmypid() == $parent_pid) {
		/* Parent thread */
		echo "Hello from parent: ".getmypid().PHP_EOL;
		array_push($data, "parent".getmypid()); 		  		 

		/* Process childrens' results as they exit */
		while(count($pids) &gt; 0) {
			$pid = pcntl_waitpid(-1, $status);
			echo "Attempting to open memory with pid: ".$pid.PHP_EOL;
			$shm_id = shmop_open($pid, "a", 0, 0);

			$shm_data = unserialize(shmop_read($shm_id, 0, shmop_size($shm_id)));
			shmop_delete($shm_id);
			shmop_close($shm_id);

			$data = array_merge($data, $shm_data);

			/* Hunt down and remove pid entry */
			foreach($pids as $key =&gt; $tpid) {
				if($pid == $tpid) unset($pids[$key]);
			}
		}

		echo "All children exited, \$data now has:".count($data)." elements".PHP_EOL;
		$pids = array();
	} else {
		/* Children threads */
		$pdata = array();
		echo "Hello from child: ".getmypid().PHP_EOL;
		array_push($pdata, "child".getmypid());
		$data_str = serialize($pdata);

		$shm_id = shmop_open(getmypid(), "c", 0644, strlen($data_str));
		if (!$shm_id) {
			echo "Couldn't create shared memory segment".PHP_EOL;
		} else {
			if(shmop_write($shm_id, $data_str, 0) != strlen($data_str)) {
				echo "Couldn't write shared memory data".PHP_EOL;
			}
		}

		sleep(rand(1,10));
		exit(0);
	}
}

/* Run the test 10 times */
for($f = 0; $f &lt; 10; $f++) {
	echo "Running $f forkTest()".PHP_EOL;
	forkTest($data);
}

echo "Fork test finished, \$data now contains ".count($data)." elements".PHP_EOL;
echo "\$data:".PHP_EOL.json_encode($data);</pre><p>This code describes a function that spawns 10 child worker threads, each of which gets a reference to the global $data array. Each child thread pushes a string element containing the child thread&#8217;s <a
href="http://en.wikipedia.org/wiki/Process_identifier">process identifier</a> into the array, serializes it, and then places it into a shared memory slot with the process id serving as the shared memory id. The parent process waits for each child thread to exit, gathers the data from shared memory, clears the shared memory, and then combines the results into the master $data array. My test application runs through this function 10 times to demonstrate how <a
href="http://www.electrictoolbox.com/article/php/process-forking/">forking in PHP</a> can be safe and memory efficient. The result should be a $data array with 110 elements in it. I&#8217;ve thrown in sleep commands with a random time between 1 and 10 seconds to show how threads can return at different times.</p><p>No doubt optimizations can be made but this should serve as at least a rudimentary example of true and efficient threading in PHP. Well, provided that the work you are planning on doing is worth the overhead (which, small as it may be, still exists and should be factored in) and provided that you do not mind locking your application down to a POSIX environment (meaning the above code will not work on windows platforms).</p><div
class="betterrelated none"><p>No related content found.</p></div><p><a
class="a2a_button_facebook_like addtoany_special_service" data-href="http://werxltd.com/wp/2010/08/23/process-forking-with-php/"></a><a
class="a2a_button_twitter_tweet addtoany_special_service" data-count="none" data-url="http://werxltd.com/wp/2010/08/23/process-forking-with-php/" data-text="Process forking and threading with PHP"></a><a
class="a2a_button_google_plusone addtoany_special_service" data-annotation="none" data-href="http://werxltd.com/wp/2010/08/23/process-forking-with-php/"></a><a
class="a2a_button_linkedin" href="http://www.addtoany.com/add_to/linkedin?linkurl=http%3A%2F%2Fwerxltd.com%2Fwp%2F2010%2F08%2F23%2Fprocess-forking-with-php%2F&amp;linkname=Process%20forking%20and%20threading%20with%20PHP" title="LinkedIn" rel="nofollow" target="_blank"><img
src="http://werxltd.com/wp/wp-content/plugins/add-to-any/icons/linkedin.png?9d7bd4" width="16" height="16" alt="LinkedIn"/></a><a
class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwerxltd.com%2Fwp%2F2010%2F08%2F23%2Fprocess-forking-with-php%2F&amp;title=Process%20forking%20and%20threading%20with%20PHP" id="wpa2a_2">Share/Save</a></p>]]></content:encoded> <wfw:commentRss>http://werxltd.com/wp/2010/08/23/process-forking-with-php/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Diskless computing vs distributed computing</title><link>http://werxltd.com/wp/2009/09/03/diskless-computing-vs-distributed-computing/</link> <comments>http://werxltd.com/wp/2009/09/03/diskless-computing-vs-distributed-computing/#comments</comments> <pubDate>Thu, 03 Sep 2009 14:56:35 +0000</pubDate> <dc:creator>wes</dc:creator> <category><![CDATA[general]]></category> <category><![CDATA[it industry]]></category> <category><![CDATA[cloud computing]]></category> <category><![CDATA[cluster computing]]></category> <category><![CDATA[diskless computing]]></category> <category><![CDATA[distributed computing]]></category> <category><![CDATA[headless computing]]></category> <category><![CDATA[network administration]]></category> <category><![CDATA[parallel processing]]></category> <category><![CDATA[seti@home]]></category> <category><![CDATA[terminal server]]></category><guid
isPermaLink="false">http://werxltd.com/wp/?p=197</guid> <description><![CDATA[A friend of mine recently asked me about cloud computing, what it was, and the ramifications of it on where we will see technology in the coming years. In his question he demonstrated a common confusion among most people between the difference between cloud computing and diskless computing. Both of these are interesting areas of computer [...]]]></description> <content:encoded><![CDATA[<p>A friend of mine recently asked me about cloud computing, what it was, and the ramifications of it on where we will see technology in the coming years. In his question he demonstrated a common confusion among most people between the difference between cloud computing and diskless computing.</p><p>Both of these are interesting areas of computer science, they do sometimes overlap, and they are both going to change computing in general in significant ways as time rolls on, but they are not the same.</p><p>Here&#8217;s are the differences to help  you can tell them apart.</p><h3>Diskless computing</h3><p><a
href="http://en.wikipedia.org/wiki/Diskless_node">Diskless computing</a> is best demonstrated in the <a
href="http://ltsp.org/">Linux Terminal Server Project</a> (excellent project, I&#8217;ve use it before to deploy over 150 diskless workstations in a company before) and Microsoft&#8217;s pathetic rival, <a
href="http://www.microsoft.com/windowsserver2003/technologies/terminalservices/default.mspx">Windows Terminal Services</a>. Sun has their <a
href="http://www.sun.com/desktop/sun-ray-clients.jsp">own solution</a> as well and there are countless 3rd party utilities, but the basic idea behind them all is that you have one big computer (or series of computers) that all these &#8220;headless&#8221; computers connect to in order to retrieve an operating system, store files, etc. For large networks this network model is absolutely amazing.</p><h3>Cloud Computing</h3><p><a
href="http://en.wikipedia.org/wiki/Cloud_computing">Cloud computing</a>, however, is the concept that you have a large problem that requires a lot of computing power to solve. Rather than buy bigger and bigger hardware, what we&#8217;ve found out (going back to <a
href="http://www.cray.com/Home.aspx">Cray supercomputers</a>) is that it is far better to split the problem down into iterative chunks and push those through multiple processors all at once rather than try to get a single processor to process everything. This is called <a
href="http://en.wikipedia.org/wiki/Distributed_computing">distributed computing</a>.</p><p>You might have heard of one of the major platforms for this type of computing, <a
href="http://www.beowulf.org/">Beowulf</a>, from the popular <a
href="http://en.wikipedia.org/wiki/Internet_meme">internet meme</a> &#8220;imagine a beowulf cluster of&#8230;&#8221; Another very popular distributed computing platform (popular because it is far easier to install, operate, and write code for than the Beowulf project) is <a
href="http://werxltd.com/wp/2009/08/26/getting-starte-with-hadoop-and-mapreduce/">Hadoop</a>. Hadoop is a project inspired by Google&#8217;s implementation of the MapReduce design paradigm written in Java which makes it a lot more portable.</p><h3>Projects using Cloud Computing</h3><p>Parallel processing is done today in a wide variety of settings including:</p><ul><li>3D rendering farms for companies such as Disney&#8217;s Pixar</li><li>indexing the web with Google, Yahoo, Microsoft, etc.</li><li><a
href="http://werxltd.com/wp/2009/08/31/an-introduction-to-statistics-and-data-mining/">data mining</a> of all sorts with companies like Wal-Mart, etc.</li></ul><h3>Join in!</h3><p>There are some very popular projects using distributed computing technologies that regular people with CPU cycles to spare are encouraged to join in on like:</p><ul><li><a
href="http://setiathome.ssl.berkeley.edu/">SETI@home</a> where you can help process data that might help us identify extraterrestrial signals</li><li><a
href="http://folding.stanford.edu/">Folding@home</a> where you can help search for cures to various diseases</li><li><a
href="http://genomeathome.stanford.edu/">Genome@home</a> where you can help map the human genome (again), this is tied closely to the folding@home project above</li><li><a
href="http://www.boingboing.net/2004/05/26/shrekhome-bluesky-pr.html">Shrek@home</a> which was a pioneer project that a few of us got to participate in</li><li><a
href="http://www.friedbeef.com/9-world-changing-projects-that-your-computer-can-participate-in/">others</a>, including <a
href="http://fightaidsathome.scripps.edu/">fightaids@home</a> to help fight AIDS and <a
href="http://lhcathome.cern.ch/">lhc@home</a> to process the massive amounts of data coming from the <a
href="http://en.wikipedia.org/wiki/Large_Hadron_Collider">CERN&#8217;s Large Hadron Collider</a></li></ul><p>So while diskless computing and cloud computing can have some areas of overlap (I configured the LTSP network I mentioned earlier to assist with the genome@home project when the systems were idle) they aren&#8217;t necessarily tied together.</p><div
class="betterrelated none"><p>No related content found.</p></div><p><a
class="a2a_button_facebook_like addtoany_special_service" data-href="http://werxltd.com/wp/2009/09/03/diskless-computing-vs-distributed-computing/"></a><a
class="a2a_button_twitter_tweet addtoany_special_service" data-count="none" data-url="http://werxltd.com/wp/2009/09/03/diskless-computing-vs-distributed-computing/" data-text="Diskless computing vs distributed computing"></a><a
class="a2a_button_google_plusone addtoany_special_service" data-annotation="none" data-href="http://werxltd.com/wp/2009/09/03/diskless-computing-vs-distributed-computing/"></a><a
class="a2a_button_linkedin" href="http://www.addtoany.com/add_to/linkedin?linkurl=http%3A%2F%2Fwerxltd.com%2Fwp%2F2009%2F09%2F03%2Fdiskless-computing-vs-distributed-computing%2F&amp;linkname=Diskless%20computing%20vs%20distributed%20computing" title="LinkedIn" rel="nofollow" target="_blank"><img
src="http://werxltd.com/wp/wp-content/plugins/add-to-any/icons/linkedin.png?9d7bd4" width="16" height="16" alt="LinkedIn"/></a><a
class="a2a_dd a2a_target addtoany_share_save" href="http://www.addtoany.com/share_save#url=http%3A%2F%2Fwerxltd.com%2Fwp%2F2009%2F09%2F03%2Fdiskless-computing-vs-distributed-computing%2F&amp;title=Diskless%20computing%20vs%20distributed%20computing" id="wpa2a_4">Share/Save</a></p>]]></content:encoded> <wfw:commentRss>http://werxltd.com/wp/2009/09/03/diskless-computing-vs-distributed-computing/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using apc
Page Caching using apc
Database Caching 2/22 queries in 0.008 seconds using apc
Object Caching 416/466 objects using apc

Served from: werxltd.com @ 2012-05-21 19:38:49 -->
