Saturday, August 29, 2009

Scaling Hadoop for Multicore and Highly Threaded Systems

Here is the latest systems webinar series on Scaling Hadoop for Multicore and Highly Threaded Systems from Sun Microsystems.

Details of this webinar as mentioned in the web site:

During this webinar you will learn about: Scale — How to use Hadoop to store and process petabytes of data Performance — How to maximize parallelism per node, and the results of tests varying the number of nodes, and integrating Flash memory drives Virtualization — How we created multiple virtual nodes using Solaris Containers Reliability — How Hadoop automatically maintains multiple copies of data and redeploys tasks based on failures Deployment options — How Hadoop can be run in the "cloud" on Amazon EC2/3 services and in compute farms and high-performance computing (HPC) environments Hadoop is typically scaled on a large pool of commodity system nodes. However, by using multicore, multithreaded processors, you can achieve the same scale with fewer machines. In this Webinar, we will discuss how Sun's chip multithreading (CMT) technology-based UltraSPARC T2 Plus processor can process up to 256 tasks in parallel within a single node. We will also share with you how we evaluated CPU and I/O throughput, memory size, and task counts to extract maximal parallelism per single node.

Labels: ,

An Introduction to Parallel Programming, Modules 3, 4 and 5

Tuesday, August 25, 2009

If You Love Maths and Programming ....

If you love maths and programming, you should visit Project Euler. As the web site mentioned:
Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

I am warning you. Project Euler is pretty addictive site for people who are passionate about mathematics and programming


Extending A Simple HTTP Server in Python

My users have this requirement of transferring files from a UNIX host to their desktop. Instead of asking them to install WinSCP or any other equivalent utility, I simply extend this Python simple http server so that they can launch the server whenever they want to transfer via browser. My modified script will listen to some random high port and also ensure that the user (unless you are root) has the ownership of the directory where they launch the http server. To avoid anyone run wild, it restricts user from launching it from the root "/" directory. This simple python script becomes pretty handy for ad-hoc file transfer.
#! /usr/local/bin/python
# minimal web server.  serves files relative to the current directory.
# random high port

import SimpleHTTPServer, SocketServer
import random, sys, os
import platform

# to avoid running wild
if os.path.realpath('.') == "/":
        print "ERROR. Cannot run under /. Run in another directory"

# ensure user own the directory, unless is root
if uid==0 or uid==owner:
        port = random.randint(50000,60000)
        url = "http://%s:%d/" % (platform.node(), port)
                Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
                httpd = SocketServer.TCPServer(("", port), Handler)
                print "Ask user to visit this URL:\n\t%s" % url
        print "ERROR. You have to be either root or owner of the directory"