Mittwoch, 23. Februar 2011

hanging process on solaris

Got a hanging process on Solaris 10 today.

It was a netbackup process writing to a tape-device with the PID 19177.

So I fired up mdb -k:

> 0t19177::pid2proc |::walk thread |::findstack -v
stack pointer for thread 30007a49800: 2a1053acb41
[ 000002a1053acb41 sema_p+0x138() ]
000002a1053acbf1 biowait+0x6c(3007eeae0c0, 0, 183fc00, 30005620000, 45c,
3007eeae0c0)
000002a1053acca1 st_cmd+0x2ec(210000045c, 11, 0, 44, 60049bd1dc0, 3007eeae0c0
)
000002a1053acd91 st_space_fmks+0xc4(210000045c, 0, 60049bd1dc0, 1e3, 45c, fc00
)
000002a1053ace41 st_mtioctop+0xb88(60049bd1dc0, 1e3, 0, 100003, 0, 0)
000002a1053acf11
st_ioctl+0xbc8(210000045c, 60049bd1dc0, ffbfa4d0, 100003,
6005a9704d8, 6d01)
000002a1053ad0e1 fop_ioctl+0x20(6003623d580, 6d01, ffbfa4d0, 100003,
6005a9704d8, 1282a58)
000002a1053ad191 ioctl+0x184(b, 3007927e7a8, ffbfa4d0, 563000, 563000, 6d01)
000002a1053ad2e1 syscall_trap32+0xcc(b, 6d01, ffbfa4d0, 563000, 563000, 0)


The first argument to st_ioctl is an dev_t, so we use ::devt to figure out major and minor:

> ::devt 210000045c
MAJOR MINOR
33 1116


So, 33 is the st driver as assumed, and 1116 is device /dev/rmt/164*.
After calling the backup guys, they found out that the tape in this drive broke and caused the hanging IO. After resetting the drive and removing the tape, the process terminated as expected.