by Sean The RIMBoy Jewett
Wednesday July 11th, 2001
A couple of months back a question was asked at work: "what is NCSA up
to these days?" The general Internet
community has not heard much from the folks who made the reference telnet
client for Mac and DOS, or Mosaic
the first ubiquitous browser. Bringing up NCSA's website found the question
quickly answered. Among other things
NCSA was hosting a conference entitled "Linux Clusters: The HPC Revolution".
The conference was held June
25-27, 2001.
Not much was on the site to begin with, a call for papers and the date,
but it looked like it might be interesting. It was NCSA after all. If you
could only see
one conference on high-performance computing -- Linux clusters no less
- NCSA would be the people to see. After assessing the costs and figuring
the
travel expenses from Nashville TN, it was agreed that attendance would
be a benefit to the department. It was just a matter of taking care of
the
paperwork and making the drive.
According to Mr. John Towns, the Division Director of the Scientific Computing
Division for NCSA, the conference targeted the high-performance
computing arena. While primarily an academic oriented event, the conference
also had participation from the various industries that support
high-performance computing. At a conference designed specifically for those
areas John said they were quite surprised to have to turn away papers.
The attendance goals of 150 people were more than met with 179 attendees.
People converged on Urbana-Champaign in Illinois from as far away as
Denmark, Italy, Thailand, Puerto Rico, and Montana (wait, that's in the
US!)
Monday set the pace of the conference. A day reserved for tutorials, attendees
were presented with two tracks. Those who chose the presentation
called "Cluster-In-A-Box" quickly found out what NCSA is up to. Having
realized that the tools to help the average user build a Linux cluster
are either not
available or too rough to use, NCSA along with IBM, Intel, ORNL and some
other industry heavyweights have teamed up to form the Open Cluster Group.
The first project of this group is OSCAR: Open Source Cluster Application
Resources Open Source Cluster Application Resources.
Built on IBM's LUI ( Linux Utility for cluster Installation), OSCAR's goal
is to make the installation of a Linux cluster as simple and yet configurable
as
needed. Those that have used Scyld Beowulf installer have an idea of how
easy Linux clustering can be. While OSCAR is not yet at the ease of
installation stage that Scyld has achieved, the Open Cluster Group seems
driven to narrow the gap and build a better cluster.
By building upon IBM's LUI the Open Cluster Group has a good foundation
upon which to build. LUI enables users to easily push out configuration
changes such as new kernels. OSCAR then adds the C3 cluster management
tool from ORNL, MPICH Message Passing Interface, and PVM Parallel Virtual
Machine, for message passing, OpenSSH/SSL for secure transactions, and
manages the job queue with OpenPBS. Although the tools are based on
RedHat 6.2, a RH 7.1 based version (OSCAR 1.1) is expected soon. The nice
part about OSCAR is that it does not lock you into certain kernels and
configurations, which Scyld's software does to a certain extent. OSCAR
can use the stock RedHat kernel or can push out kernels of your choosing.
The
Open Cluster Group wants all of the tools and software developed and distributed
there to be under an open-source license. This avoids the
complications users face with closed source or proprietary clustering solutions.
Other nifty features of OSCAR include Ethernet PXE support to enable automated
installation of nodes via the network. The Group has also put together
a
full-featured installation manual to walk users through the setup. The
manual hopefully leaves no questions unanswered, a common complaint from
people unfamiliar with open-source projects. With screen shots, the manual
is twenty-four pages and comes complete with troubleshooting suggestions.
Monday afternoon saw presentations on "Performance Tuning for IA32 and
IA64 based Clusters" from the Ohio Supercomputer Center and "Low-Cost
Linux Clusters for Biomolecular Simulations Using NAMD" from the University
of Illinois (UIUC). At some point during the day word got out about the
beast
NCSA was keeping in their operations center. Of course, all of this talk
of clustering had everyone itching to see what NCSA had. NCSA staff thankfully
quelled the appetite of the attendees and threw together a tour of their
facility. Much like the monolith in 2001, everyone huddled around NCSA's
top-ranked cluster dubbed Platinum. Unlike its name, but like the Monolith,
this system was jet black. IBM black, to be precise: 512 nodes of dual
Pentium III
power with a paltry 1.5 gigs of RAM per node, Myricom's Myrinet and fast
Ethernet interconnects. This cluster is built for speed. Currently ranked
30th in
the Top 500 Supercomputer Sites, NCSA's Platinum is currently the fastest
known unclassified Linux cluster in the world.
Tuesday marked the start of the conference: the presentation of papers,
and networking of minds. Dan Reed, Director of NCSA and the Alliance kicked
off the event by highlighting the Grid and large scale computation displays.
Intel's Tim Mattson started events Wednesday morning discussing where
clustering started and where it is going. Tim admitted that he worked on
a cluster of 286's in the early 1980's and understood then the potential
we are
only realizing now. Topics covered in the presentations on Tuesday and
Wednesday where fairly diverse.
http://www.ncsa.edu/LinuxRevolution/schedule.htm.
Putchong Uthayopas of Kasetsart University in Thailand demonstrated the
SCE: a software tool for Beowulf clusters. The presentation was simply
amazing. Putchong's group demonstrated very polished and refined cluster
installation and management tools. And don't tell Putchong's group that
VRML
is dead. Attendees were blown away with his group's VRML tools for polling
information on nodes. You've never seen a cluster until you have
witnessed it in 3D (in particular the filesystem). As Putchong put it,
'the people that give you the money don't necessarily understand what it
is you're
doing. They do like graphics though'. Of course, Putchong stressed that
SCE is configurable to the needs of the user, and schedulers such as PBS
could
be plugged in if needed.
Steven Timm gave an overview of what computing is like at the Fermi Lab.
After hearing the volumes of data Fermi Lab generates, you come to realize
that terabytes of storage are quite blasé. It's the petabytes they're
after. However, it would be Princeton Asst. Professor of Geophysics Hans-Peter
Bunge who would drive home a very important lesson when building Linux
clusters: for Bunge the key in building a faster Linux cluster is not to
run
coarse simulations faster, it is to enable him to run higher resolution
simulations in a similar timeframe. His Geowulf System is a great example
of finding
the best performance for the lowest price.
Neil Gorsuch's (of the NCSA's Clustering Group) presentation on Linux Cluster
Security drove home the need for an easily manageable site-wide security
configuration tool. In Academia the goals and needs of researchers outweigh
what any business would consider a standard security practice. As a
result, some clusters must have their nodes publicly accessible (all of
the nodes publicly accessible). Trey White of ORNL discussed issues faced
when
using proprietary clustering solutions. Apparently, getting different high-end
vendor's clustering solutions to play nice across hardware platforms can
be
slightly difficult.
Support for the conference came from the NCSA Alliance, along with IBM,
Intel and Myricom. As a result, several of the vendors gave presentations.
Chuck Seitz -- CEO and CTO of Myricom -- went into the science of high-performance
networking with "Dispersive Routing in Clos Networks". Tim
Mattson discussed Intel's work in cluster computing while Luiz DeRose shared
information on the Linux cluster tools being developed by IBM's Advanced
Computing Technology Center.
NCSA did a nice job in planning the conference. Presenters were given plenty
of time to not only show their work but to answer questions as needed.
Likewise, downtime between presentations gave people a chance to network
and mingle. Lunch was provided which kept the focus on the conference,
and dinner was entertaining. Entertainment for Tuesday's dinner included
an indoor mini-golf course and a mechanical bull. Perhaps not what you'd
expect from a bunch of people discussing gigs and high-performance networking,
but it was a hit. John Towns commented that the computing
atmosphere reminds him of the early 80's in terms of the close collaboration
between the research institutions and the Cray's and CDC's. Those close
collaborations are now being paralleled with Linux cluster solution providers,
particularly at the high end. John said the level of collaboration taking
place
was refreshing and called the conference a success because of that.
Sean "The RIMBoy" Jewett is a Systems Administrator for the Center for
Structural Biology at Vanderbilt University. By day he wrangles Linux and
SGI,
by evening he maintains his own floppy distro. and is the VP of NLUG. The
Center sponsored his attendance at NCSA's conference.