[Top page] [Rants and Raves] [C.V.] [IBMS computers] [Beowulf Cluster] [Beowulf Cluster Queues] [Home computers] [Gateway computer] [S. America photos] [Dynamic mem in Fortran]

Boris - The Beowulf Cluster.

For details about the queue system then click here

The latest addition to our computer system is a Linux based Beowulf cluster, the present time we have about 100 machines giving a total of about 480 cpus, 1Tb ram and 30Tb of disk on the cluster.

The front end machine

The front end is a Core2 Quad Q9950 on a Tyan S5220AG2NR motherboard,
4x2Gb DDR800,
2x 750GB Western Digital hard disks (set as raid-1), mounted in a hot swap Proware 223A disk cage with a
3Ware 8006-2LP raid-1 controller,
mouse, keyboard and a Viewsonic 17" LCD monitor,
builtin dual Intel 1000pro Gigabit ethernet and additonal Intel gigabit network card.

This machine also has installed the Pgroup and Intel preformance compilers. It houses an nfs drive that contains all the programs and ancillary data that the cluster nodes may need. We run the torque batch queue system with the maui scheduler. For now we have four queues with MAXJOBS and MAXJOBPERUSER limits on each queue. The cluster network is housed in a 19" rack with a stack of Cisco SLM-2048 gigabit switching hubs in the actual cluster room whcih is accessed via the 1st gigabit port, another gigabit ethernet card connects to the office network to allow access to/from the workstations and the 2nd gigabit port is connected on a small backend network that connects all the servers togeather. This machine is not housed with the cluster but is instead in the office server room and linked to the cluster room on the floor below.

The cluster backend nodes have been split into two different rooms, the bulk of the cluster is housed in a purpose built room in the basement, the room has a raised floor with the cables running in the floor space and 10 cabinets of 23" shelving. Each cabinet has 4 shelves and can hold eight machines, there is approx 24" of space behind each cabinet and we have a small cart equipment with an old 15" monitor and keyb that can be wheeled to machine and plugged in as required. The A/C is separate to the main building and runs 24/7. Approx 1/3 of the ceiling space is given over to A/C outlets and heat removal panels. There are a total of 2 air coolers which can each supply 600 cfm of air cooled to 12C. With 110 cpus in the room and two coolers working on medium suppling 15C air the room stays at about 28C. Since then we've added a third cooler to get the temps down to approx 22C. The electrical supply to the room is taken straight from the buildings main power room as 3 phase 380V with breakers inside the cluster room. The 3 phase is fed to two 12KVA UPS systems and returns as 110V to two separate breaker panels. Each cabinet has two power supplies and thus two breaker switches. About 60 cpus use 65% of one 12KVA UPS.  The rest of the cluster is in a shared computer room with similar power, flooring and racking. 

The rest of the cluster is located in a shared computer room using 6x 22" cabinets which can hold 8 machines each. They are on a shared UPS system and networked separately but with a dedicated link to our front-end machine.

Since then we have converted 2 of the racks to 19" standard and each one houses its own mini-cluster complete with separate disk arrays and switches. These are normally setup in batches of 4 (32 core's) for parallel computations that require a dedicated shared iscsi gfs2 mounted disk space. These are based on Dell R610 machines at the moment with iDRAC6 handling the fencing.

The actual machines:

Machine Type Mainboard Memory Disk Net card Notes
AMD Phenom quad 9650X4 Asus M3A78-VM 4GB DDR800 40Gb Western Digital Intel Gigabit Card
Intel Core 2 QUAD Q9550 Supermicro C2SBM-Q
8Gb DDR800 80Gb Western Digital Intel Gigabit Lan
AMD Phenom quad 9550X4 Asus M3N78-CM  4Gb DDR667 80Gb SATA Western Digital Intel Gigbit Lan Card use pci=nomsi on boot. achi bios option for SATA. RHEL 5 2.6.18.x not stable with builtin net,
Intel Core 2 Quad Q6600 Asus P5K-VM, P5E-VM SE, P5B-VM SE, P5Q-EM 8Gb DDR667 80Gb Western Digital Intel Gigabit Lan Card.
Atlansic L1 gigabit, atl1 vendor driver compiled after installation was no stable.
install with SATA=compatible, then switch to enhanced, use RHEL5, 
Intel Core 2 Xeon E5405 Tyan Tempest S5375 16Gb DDR667 500Gb SATA Hitachi Intel Gigabit Lan Needed to set IDE to AHCI in bios to work
Intel Core 2 Xeon X5450 Tyan Tempest S5396 16Gb FB-Dimm DDR667 750Gb SATA Western Digital Intel Gigbit Lan intalled with SATA set to AHCI
Intel Core 2 Xeon 6-Core E5645
Dell R610
32Gb FB-Dimm DDR1333
2x 500Gb Seagate SAS
Broadcom Gigabit Lan
installed with the mptsas driver
Intel Core 2 Xeon 6-Core X5650
Dell R610
32Gb FB-Dimm DDR1333
2x 500Gb Seagate SAS
Broadcom Gigabit Lan
installed with the mptsas driver

The servers are on a private network and all /scratch disks can be auto mounted on the front end. The front end holds all the user accounts (/home)

The front end machine can be logged into using openssh and from there users can rsh to the backend machines and use the machines interactively for a maximum of 5 cpu minutes. The front end is used to submit jobs to a batch queue system which finds a node to run the job on and then returns the output when its finished. The cluster is load balanced so that all nodes get 1 job, when each machine has a job then job submissions are queued to wait for free nodes. The queues are managed on a fair-share basis so that if one user submits alot of jobs in one go a user lower down the queue can leap in front of them rather than wait for all the first users jobs to complete. The longer a job waits in the queue the higher its priority for being the next to run. A job can also over run the queue limits if it has not finished and there are no other jobs waiting to run.

And here is a slightly outdated picture, note the rack which is custom designed and installed! As you can see the top shelf is unused at the moment and there is still alot of room on the 2nd and 3rd ones. We took over a cleaners cupboard to fit all this in after adding another 3 30amp lines and air con. Actually it all worked very well up to about 16 machines but was alittle cramped.

The bottom shelf on the left has 2 UPSes, then the front end machine, then 2 SGI Origin 200's, then a power challenge and last a couple of IBM 3CT's. Next shelf up has the old Pentium which is ftp/mail/samba/dns/print server and also a few other items. Second shelf has 8 dual Piii-850's each with 512Mb ram either 10Gb or 13Gb of disk. These machines have on the whole been moved down the purpose built server room and we now use this racking to house the file servers, cluster frontend, O200's and disk arrays. It also provides a handy place to temporarily install new nodes before they go down to the main room. This room can house about 20 machines before the A/C can not cope. Nowadays it usually contains 6-10 machines and disk arrays.

The picture below is how we did it the second time around, note that while you cannot see it the cables do all run thro the floor space to the UPS and network stack. Here you can see half of the room, the other half which is off to the left is shelved in an identical way for a total of 80 machines.

At the far end we have the P4-3Ghz and the blue-fronted ones are the P4 2Ghz machines. Off to the left of the photo there is another set of racks which house the Dual AMD's and the Xeons. Below you will find a photo of the backs or the machines, note the space to allow someone to walk down and also push a monitor cart. The power sockets are mounted to the racks to reduced the cable lengths and all cables are coiled and tied to make things neater. The network switches are mounted in a 19" rack like the one below.


Last update: Thu Jun 8 23:41:43 CST 2006 Comments to: jon _at_ sinica.edu.tw
These pages were created using vim -a very much vi-improved.

Opinions on these pages are generally not Academia Sinica's.