Infiniband Tips and Tricks

Below are some commands I’ve found useful when dealing with infiniband and related issues.

Determine port state on all ports available within the fabric

The command ibnetdiscover command will show you the state of each port in the the fabric. It’s a good idea when running this command (on any node) to redirect the output into a file (especially if you have a large number of ports)

Determine indivudual port information

The ibportstate command can provide additional port information once you have the LID (available from ibnetdiscover)

Example:

ibportstate -L <lid> query

Determine current node description

smpquery nodedesc 1

Enable / Disable a port

ibportstate -L <lid> -P <port> enable
ibportstate -L <lid> -P <port> disable

New Perl module: Filesys::Virtual::Chroot

59560038

I’ve been trying hard lately to take useful code I’ve written over the years for different projects (such as my predictive anti-spam system Ruckus scanmail) and rep.ublish the libraries with more generic names under CPAN

Filesys::Virtual::Chroot provides advisory functions for creating a virtual chroot environment. This is useful when you wish to lock a process which takes input from the wild into a set of directories.

This library can be downloaded here directly: Filesys-Virtual-Chroot-1.3.tar

Or pulled off of CPAN with:

sudo cpan Filesys::Virtual::Chroot

Command.

Another year, another LUG.

Well it’s been another year, and another Lustre User Group meeting. There were many interesting discussions which took place. Though in general I found the most useful session to be after the LUG was completed. The developer meeting, for me at least was an excellent use of my time.

20150414_172356

 

PANO_20150413_135544

Got to hang out with some colleagues and old friends20150414_214137

 

Quite a few `primary’ developers at the developer meeting.PANO_20150416_144330

Multiple claimed blocks errors

So your file system has crashed… Upon running e2fsck against it you’re now seeing error such as:

Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 4195619:
167904376 167904377 167904378 167904379 167904380 167904381 167904382 167904383 167904384
167904385 167904386 167949296 167949297 167949298 167949299 167949300 167949301 167949302
167949303 167949304 167949305 167949306 

What does this all mean?

Well unfortunately for you this means that your file system has incurred major damage.

What’s happened is the inode structures around this part of the file system have been damaged to the extent that multiple inodes are claiming to own the same data blocks.

There is no way to really fix this without rolling back the file system or restoring from backup.

The inode (in the case above) which was damaged was 4195619. Some other inodes have already layed claim to the blocks listed above and now this inode is trying to also lay claim to them. Multiple inodes cannot claim the same blocks (unless of course we’re talking about special inodes such as hard links).

Sadly the inodes which were discovered before inode 4195619 could potentially be damaged and this inode could be fine. The only way to know for sure would be to dump the inode and it’s associated blocks out to a file, then verify that file based on a previous checksum you may have taken.

Dell XPS 13 (9333) A06 BIOS (UPDATED)

 

Dell has released A06 BIOS version for the XPS 13 (9333) model. The release notes (available here) are extremely light on detail, however I can tell you it does seem to resolve (at least so far for me, and others) the issue with unreliable suspend and resume!

It doesn’t seem to resolve the bizarre problem with delayed brightness key controls though.

For those of you tux lovers out there, the upgrade is simple, grab a good USB stick (the cheap ones tend to not boot correctly), install FreeDOS via unetbootin, download the BIOS image .exe file, copy it to your USB stick, boot the USB stick, and the execute the .exe, the BIOS installer will start, walk through the prompts and wait for it to reboot your system.

UPDATE:

Well, sadly after a few days of testing, it would seem the bug still persists within the BIOS which results in lock up on shutdown. It took considerably longer to get there, however it’s still present.

Dell XPS 13 (9333) with A05 BIOS

So over the last few weeks I’ve been trying to sort out the final problems with my XPS 13. The last, and most annoying problem I’ve seen is the frequent and seemingly random lockups on suspend.

This bothers me a lot because it effectively means I can’t just close the lid on the laptop and toss it in a bag and go.

Based on what I’ve observed the issue seems to be solidly within the A05 BIOS. From my understanding, speaking with Dell Pro Support, this BIOS revision is beta, and only available to those users who have either received new systems after October 15th 2014, or had their motherboards replaced to address the coil whine problem.

The issue it self appears to relate closely to the power state the laptop is in, and the fact that the OS may change the power state when plugged in, to something not regonized or supported by the BIOS at the time.

So effectively you can suspend and resume reliably, so long as you don’t have the power plug, plugged in while the system is running. Of that, it seems to take some time (i.e. secondary power state change) for it to reliably trigger the hang when suspending into memory.

I can’t seem to determine much past that as the OS has handed off duties to the BIOS and it’s the BIOS which fails to bring the system down to sleep.

If you want to comment with Dell on this, I have an active thread here:

http://en.community.dell.com/techcenter/os-applications/f/4613/t/19604474

Determine existing journal size on an Internal journal for EXT3 / EXT4 / LDISKFS file systems

Just a quick note about figuring out existing internal journal sizes for EXT3 / EXT4 and LDISKFS file systems:

First thing to do is determine the current inode block, this is usually inode 8, however this may change depending on the file system, so it’s worth checking for sure:

# tune2fs -l /dev/sdXY | grep -i "journal inode"

This command returns the inode at which the journal resides

Journal inode: 8

Next you’ll need to probe that inode directly with the debugfs tool. This can be done with the following commands

debugfs /dev/sdXY
debugfs 1.42.9 (4-Feb-2014)
debugfs: stat <8>

Which results in the following output:

Inode: 8   Type: regular    Mode:  0600   Flags: 0x80000
Generation: 0    Version: 0x00000000:00000000
User:     0   Group:     0   Size: 134217728
File ACL: 0    Directory ACL: 0
Links: 1   Blockcount: 262144
Fragment:  Address: 0    Number: 0    Size: 0
 ctime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
 atime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
 mtime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
crtime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
Size of extra inode fields: 28
EXTENTS:
(0-32766):2655233-2687999, (32767):2688000

What we’re interested in here, is the Blockcount

Links: 1   Blockcount: 262144

So in this case we have 262144, 4096 byte blocks, which equals 128 megabytes

Also, on some newer versions of debugfs, the tool will calculate the size of the inode based on block count. This calculation can be seen in the Size: field

User:     0   Group:     0   Size: 134217728

Rsync snapshots / rolling backups

Today I had the task of updating my backup system, for a long time I didn’t bother rolling backups because, well I just didn’t care. However I now wanted to set something up. I remembered long ago that I had done this before, based on the excellent, albeit dated write up from Mike Rubel. After some hacking I rewrote his script to make it a bit more useful in my environment. The script now accepts a single argument, which points to a configuration file. To download the initial version of this tool I called ‘snapshots’ click on the link below.

snapshots.tar.gz