Simple tip, to decode thread stack dumps from Lustre file system, which typically get dumped into /tmp, run:
lctl df <input file> <output file>
This will leave you with an ascii formatted, human readable stack trace which can be used for further debugging.
Below are some commands I’ve found useful when dealing with infiniband and related issues.
Determine port state on all ports available within the fabric
The command ibnetdiscover command will show you the state of each port in the the fabric. It’s a good idea when running this command (on any node) to redirect the output into a file (especially if you have a large number of ports)
Determine indivudual port information
The ibportstate command can provide additional port information once you have the LID (available from ibnetdiscover)
ibportstate -L <lid> query
Determine current node description
smpquery nodedesc 1
Enable / Disable a port
ibportstate -L <lid> -P <port> enable
ibportstate -L <lid> -P <port> disable
I’ve been trying hard lately to take useful code I’ve written over the years for different projects (such as my predictive anti-spam system Ruckus scanmail) and rep.ublish the libraries with more generic names under CPAN
Filesys::Virtual::Chroot provides advisory functions for creating a virtual chroot environment. This is useful when you wish to lock a process which takes input from the wild into a set of directories.
This library can be downloaded here directly: Filesys-Virtual-Chroot-1.3.tar
Or pulled off of CPAN with:
sudo cpan Filesys::Virtual::Chroot
So there are multiple methods to pull this off, however after doing it by hand, using DVD::rip, using Handbrake, etc. I finally settled on VLC. Yes, good old VLC has DVD ripping support. http://www.instructables.com/id/How-to-rip-DVDs-for-free-with-VLC outlines how this is done, and from my recent tests, seems to work extremely well.
Well it’s been another year, and another Lustre User Group meeting. There were many interesting discussions which took place. Though in general I found the most useful session to be after the LUG was completed. The developer meeting, for me at least was an excellent use of my time.
Got to hang out with some colleagues and old friends
Quite a few `primary’ developers at the developer meeting.
So your file system has crashed… Upon running e2fsck against it you’re now seeing error such as:
Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 4195619:
167904376 167904377 167904378 167904379 167904380 167904381 167904382 167904383 167904384
167904385 167904386 167949296 167949297 167949298 167949299 167949300 167949301 167949302
167949303 167949304 167949305 167949306
What does this all mean?
Well unfortunately for you this means that your file system has incurred major damage.
What’s happened is the inode structures around this part of the file system have been damaged to the extent that multiple inodes are claiming to own the same data blocks.
There is no way to really fix this without rolling back the file system or restoring from backup.
The inode (in the case above) which was damaged was 4195619. Some other inodes have already layed claim to the blocks listed above and now this inode is trying to also lay claim to them. Multiple inodes cannot claim the same blocks (unless of course we’re talking about special inodes such as hard links).
Sadly the inodes which were discovered before inode 4195619 could potentially be damaged and this inode could be fine. The only way to know for sure would be to dump the inode and it’s associated blocks out to a file, then verify that file based on a previous checksum you may have taken.
Dell has released A06 BIOS version for the XPS 13 (9333) model. The release notes (available here) are extremely light on detail, however I can tell you it does seem to resolve (at least so far for me, and others) the issue with unreliable suspend and resume!
It doesn’t seem to resolve the bizarre problem with delayed brightness key controls though.
For those of you tux lovers out there, the upgrade is simple, grab a good USB stick (the cheap ones tend to not boot correctly), install FreeDOS via unetbootin, download the BIOS image .exe file, copy it to your USB stick, boot the USB stick, and the execute the .exe, the BIOS installer will start, walk through the prompts and wait for it to reboot your system.
Well, sadly after a few days of testing, it would seem the bug still persists within the BIOS which results in lock up on shutdown. It took considerably longer to get there, however it’s still present.
So over the last few weeks I’ve been trying to sort out the final problems with my XPS 13. The last, and most annoying problem I’ve seen is the frequent and seemingly random lockups on suspend.
This bothers me a lot because it effectively means I can’t just close the lid on the laptop and toss it in a bag and go.
Based on what I’ve observed the issue seems to be solidly within the A05 BIOS. From my understanding, speaking with Dell Pro Support, this BIOS revision is beta, and only available to those users who have either received new systems after October 15th 2014, or had their motherboards replaced to address the coil whine problem.
The issue it self appears to relate closely to the power state the laptop is in, and the fact that the OS may change the power state when plugged in, to something not regonized or supported by the BIOS at the time.
So effectively you can suspend and resume reliably, so long as you don’t have the power plug, plugged in while the system is running. Of that, it seems to take some time (i.e. secondary power state change) for it to reliably trigger the hang when suspending into memory.
I can’t seem to determine much past that as the OS has handed off duties to the BIOS and it’s the BIOS which fails to bring the system down to sleep.
If you want to comment with Dell on this, I have an active thread here:
Just a quick note about figuring out existing internal journal sizes for EXT3 / EXT4 and LDISKFS file systems:
First thing to do is determine the current inode block, this is usually inode 8, however this may change depending on the file system, so it’s worth checking for sure:
# tune2fs -l /dev/sdXY | grep -i "journal inode"
This command returns the inode at which the journal resides
Journal inode: 8
Next you’ll need to probe that inode directly with the debugfs tool. This can be done with the following commands
debugfs 1.42.9 (4-Feb-2014)
debugfs: stat <8>
Which results in the following output:
Inode: 8 Type: regular Mode: 0600 Flags: 0x80000
Generation: 0 Version: 0x00000000:00000000
User: 0 Group: 0 Size: 134217728
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 262144
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
atime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
mtime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
crtime: 0x54278f31:00000000 -- Sat Sep 27 22:31:45 2014
Size of extra inode fields: 28
What we’re interested in here, is the Blockcount
Links: 1 Blockcount: 262144
So in this case we have 262144, 4096 byte blocks, which equals 128 megabytes
Also, on some newer versions of debugfs, the tool will calculate the size of the inode based on block count. This calculation can be seen in the Size: field
User: 0 Group: 0 Size: 134217728
Today I had the task of updating my backup system, for a long time I didn’t bother rolling backups because, well I just didn’t care. However I now wanted to set something up. I remembered long ago that I had done this before, based on the excellent, albeit dated write up from Mike Rubel. After some hacking I rewrote his script to make it a bit more useful in my environment. The script now accepts a single argument, which points to a configuration file. To download the initial version of this tool I called ‘snapshots’ click on the link below.