"Electric Mayhem", RIMBoy's Cluster

The Electric Mayhem Cluster


Here's some info on my cluster, the Electric Mayhem. The name comes from the band on the Muppet Show, otherwise know as Dr. Teeth and the Electric Mayhem. I decided to name the cluster this given the purpose and the fact that it basically sucks electricity. While Dr. Teeth is the name for my main workstation (he was the band leader), key nodes were named for memebers of the band. Janice is the name of my master node in the mosix cluster, Sam is the name of the central file server. Why not Animal? Animal is the name of my iBook.

What is the purpose of the cluster? Good question. First and foremost it is a test bench to test the Mosix clustering technology. That said, there is no point in having a cluster when there is nothing to run on it. Given the large cd collection I have yet to properly rip into mp3 format, I decided to test mosix clustering with the purpose of ripping mp3's.

The roots of this project was born sometime in November / December of 2000 while I was working on my mp3 streaming server on a floppy. After realizing many other projects could benefit from linux based single floppy projects, I set out to find other utilities that could be condensed to the size of a floppy. At some point (~Feb / March 2001) I realized Mosix could be crammed onto a floppy.

So the roots of the RIMigration project were in place, it just became a matter taking the time to put the hooks into place and doing it in such a manner that I could change certain key configuration issues on the fly without having to rebuild the floppy each and every time. Thus such items as hosts and the mosix.map are pulled down every time a floppy based mosix node comes online. DHCP support is in the works, but right now each floppy has a static IP address.

The process of building the cluster continued off and on over the summer 2001 and into the fall. I was finally given a hard deadline to have the cluster up and running by and the wheels were set into motion. Networking topology and ways to make the various jobs communicate between each other needed to be put into place, scripts to be written, wire to be pulled, cables to be crimped.

Many of the pieces and portions of the hardware were in place before the cluster was built, often they were key to the inspiration of the project. The roots for the project were laid early on when a guy gave me a 7disc Nakamichi jukebox. I realized early on that multiple cd cdrom drives were the key to ripping my cluster. With hardware such as that, rather than feed a cd repeatedly, I could load up 7 or so cd's, turn it loose and come back 12-24 hours later. I did not care so much about the speed in which the ripping occured. I cared more about the fact that 7 discs were ripped with me only having to load the drive(s) once.

Of course the 7disc jukebox is not online (yet), however I was given a 7 disc cd tower for a previous ripping project. I would rip out all of the drives (some were bad) and with a good deal from HiTechCafe would replace all of the drives in the tower.

Here are the two towers I have online at this time:

13 cd capacity across 2 systems

The above towers have the ability to hold 13 cd's. The 7 disc tower was truly built as a CDROM tower, it is connected to a 486 in a seperate case below. The 6 disc tower is home built and utilizes a full tower case that was given to me at the Dayton Hamvention several years ago. It contains a 486 system. Interconnects to both cdrom towers are scsi, of course. I chose the 486's after doing some tests. In my Mosix testing I determined that 486dx66 procs were the lowest system I wanted to have in a cluster. Given a number of pentiums set aside for the cluster I put a dx50 and dx66 into these ripping systems. Cdparanoia requires a fair bit of math to do compensation for jitter and read errors. The 486's in my opinion are adequate for this purpose. Again, I'm not after speed, most of these drives are 6x and 8x drives. I prefer the drives to take their time, they have a fighting chance against errors.

The ripping systems are quite simple. They are floppy based systems with NFSv3 support in a 2.2.19 kernel. Cdparanoia and a small C program to determine the info the freedb database needs for the query are on board. A simple ash based shell script iterates over the device id's passing them off as options to cdparanoia. Once a cd is ripped it is ejected. After the last disc ejects the script goes into a wait mode. I simply reload each drive, press enter on the keyboard and the drives start the process again. Each system must have more than 16 megs in order for cdparanoia to operate. There is no swap space, each floppy (as with all that I put together) are loaded into ram at boot time, they are not and cannot be accessed once the system is booted.

Each of those systems save the resulting wav files across an nfs mount to a central nfs server (Sam). The cdid info for the freedb query is saved to a text file. Once the entire cd is ripped, a file call fin is also written to the filesystem. It's purpose will be described later.

CD ripping systems with nfs server

The nfs server is running kernel 2.4.10 patched with xfs (no particular reason for using this minor version kernel fyi). Two things were important. The nfs server had to support nfs3 in which the 2.4 kernel does fairly well (much better than 2.2.x). Also, because I have a 40 gig HD (in the removeable bay) I wanted xfs for those occasional power outages and for performance. Funny, that would be tested in less than 24 hours after I brought the cluster online. Since everything would end up on the cluster one way or the other, I added an 8x scsi cdburner to the system. The NFS server is a Celeron 366.

Networking is of course the key to making the cdrippers talk to the nfs server. Given that I was trying to build this cluster as cheaply as possible, I decided to use and scrounge whatever I could find around the house. I have quite a bit of 10baseT gear so it became the basis of communications. At the 2001 Hamvention, a vendor was giving away 6 Cabletron managed hubs. They were quite old and utilized the champ connectors that the telephone company uses on location. This was probably the reason the guy gave them away. I knew I could wire them up having worked with 10baseT in a similar configuration at work. A local ham friend had punchdowns for cheap and I aquired a couple of prewired champ connector while helping a friend re-wire his office (the former occupant had been an old bbs / ISP).

While this may not be the fastest of networking, I did it for very little cost and overall it appears to perform as needed.

Cabletron Hubs in action


Closeup of the patch panel


Wide angle of patch panel and wiring


Freesco Floppy based router and hubs

I used the Freesco floppy based router as the gateway between the cluster network (in particular the nfs side of the network) and the rest of my home network. Given the volume of traffic I'd be moving I decided it was best that they not interfere with the wife's ICQ sessions.

Another design consideration was to keep the mosix traffic seperate from the nfs traffic. This too would help performance on my otherwise low end network. Thus, a master node was designated for the Mosix cluster. It is a dual homed P200. One interface talks to the nfs server, the other talks to the Mosix network. So far this configuration appears to work well. It may be possible to bring another master node online and have it work in conjunction with this current master node should the need arise. As a matter of fact, because of the design of Mosix this would be plug and play. This could be put into place should there be IO bottlenecks or for general failover should a master fail.

Mosix Node, Cyrix 120 (l), Master Node, P200 (r).

The master node has 64 megs of ram. With approximately 7 lame processes and the associated OS overhead the system never dips into swap. Each slave system has 16 megs of ram. That is the minimum for each node. While the nodes can run with less, they may not have a process migrated to them. Each slave node boots off the floppy (of course) and thus has the entire OS loaded into ram. Memory is already at a premium.

Top row, l-r P133, DX4-100, P133, P133

The Mosix cluster has to be fairly loaded before processes will be migrated to the DX4-100. I've found that it take around 7 Lame processes before a Lame process will be migrated to the DX4-100. The nice part about Mosix is that when other nodes are freed up then that process can be migrated to a faster system. In short, it effectively utilized compute cycles. Anyone that can do work does, anyone that can do it better is chosen.

The wiring Mess


Waiting in the wings: The Kubik dual cdrom drive 240 cd cdrom carousel