My little cluster…

So recently I’ve put together a test cluster for testing Lustre (www.lustre.org), which is a file system technology that I work directly on in my day job.  The cluster is a bit interesting as it’s a 5 node, dual E5540, DDR IB system.

In general it works well, however it really burns power so lately it’s not been on all the time.

IMG_20150402_113845 IMG_20150402_113827

Another year, another Super computing, this year, New Orleans

Well, another year has gone by and another Super computing conference. This years SC14 conference was held in New Orleans. The notable things I found this year, at least things I found interesting where the 8TB archival drive from Seagate, all of the various 3M NOVEC cooling demonstrations, the D-WAVE quantum processor and other items. I grabbed pictures and they’re available below, enjoy!

Continue reading

Some Lustre FS basics

Lustre is a parallel distributed file system which provides high performance capabilities and open licensing, it is often used in supercomputers and extremely large scale hosting facilities (think Netflix via their 3rd party hosts). Generally, most of the worlds largest supercomputers utilize one or more lustre file systems, this includes the world’s fastest fastest supercomputer, Titan.

Lustre runs exclusively on Linux based systems and is composed of 4 major parts:

1. MDS (Meta Data Server), this single node hosts all file system meta data for a single file system instance, and may host the management service as well. It is composed of supported Linux installation running a supported kernel. It hosts the single major component the MDT (Meta Data Target) which usually a redundant high speed storage array of some make up.

 

When choosing hardware for your MDS it’s critical that is is highly reliable and well tested. It’s also critical that it can perform small reads and writes against your storage target with is much speed as possible. Multiple MDS units can be setup and installed to run against a shared MDT. However these units must run in an actively managed Active<->Passive schema.

 

2. MGS (Management Server), this single node may be part of the MDS (described above) or hosted on it’s own dedicated node. It’s generally used as a site-wide multi-file system management and configuration node, however with new features such as imperative recovery the role of this node will be greatly expanded in the future.

 

3. OSS (Object Storage Server), this can be one or more dedicated, and high bandwidth servers which host all object data on one or more OSTs (Object Storage Targets). These OSTs provide the primary data store of the file system. They also facilitate data transmission directly back to the Client nodes.

 

OSS nodes benefit from as much memory as possible (for file caching), as well as, as much system bus bandwidth as possible. Fast CPU’s with good memory management and bus architecture help this. The other thing that I’ve found works very well is mdraid based storage targets. These arrays allow for a huge amount of direct tuning and tinkering but require much more work to get as much performance as possible out of them.

 

4. Clients, like the server nodes described above, these are dedicated systems which can provide compute environment needs or file system export needs (say to export to CIFS, Samba, NFS, dCache, etc).

The big concern when fitting out your file system is that it will only be as fast as your slowest part. GigE, and even 10GigE or better, are good choices, however remember you’re loosing 30% to TCP/IP overhead and you’re also hurting your CPU with excess processing cycles to deal with those packets. A better choice is Infiniband interconnects which can run Lustre LNET networking (which I will describe in more detail in the next article).

In the next article I’ll discuss file system configuration and LNET (Lustre Networking).

Going to post some Lustre FS basics on youtube..

I’ve been thinking about this for a while now. To help the geeks out there who might have an interest in Lustre FS I’m going to post a series of videos on Youtube detailing how to setup and administrate a basic Lustre cluster, as well as more advanced options for high capacity and high performance use cases.

This should be generally interesting for anyone who’s remotely interested in Super Computer technology, as this file system drives the largest machines on the planet.

 

SC12 images