thbe blogs: Check your server health

From time to time I had the problem that a server is not responding as expected. To figure out what the problem is you can use several command line tools on a Linux host that gives you an idea what is wrong. Therefore I normally install two tools on my CentOS servers:

sudo yum install procinfo sysstat

Both tools are quite handy if you want to determine were the performance bottleneck is on your machine. Let's start with somethings like this:

clear && procinfo && sar

See an example output taken on one of my servers:

Linux 2.6.18-53.1.4.el5 (mockbuild@builder6.centos.org) (gcc 4.1.2 20070626 ) #1 SMP Fri Nov 30 00:45:55 EST 2007 4CPU [neo.int.XXXXX.XX]

Memory: Total Used Free Shared Buffers
Mem: 8177876 8130848 47028 0 30844
Swap: 4194296 180 4194116

Bootup: Mon Jan 28 11:59:26 2008 Load average: 1.23 1.29 1.07 1/337 13383

user : 2:41:10.65 2.4% page in : 0
nice : 0:00:00.04 0.0% page out: 0
system: 11:23:19.67 10.3% swap in : 0
idle : 3d 23:08:26.08 86.0% swap out: 0
steal : 0:00:00.00 0.0%
uptime: 1d 3:37:54.38 context :504866586

irq 0: 99474323 timer irq 12: 105 i8042
irq 1: 11 i8042 irq 50: 1080554 libata
irq 3: 18 irq 58: 235766212 0 0 235766
irq 4: 20 irq169: 3127348 ioc0
irq 8: 1 rtc irq225: 23 uhci_hcd:usb1, uhci_
irq 9: 0 acpi irq233: 0 uhci_hcd:usb2

Linux 2.6.18-53.1.4.el5 (neo.int.XXXXX.XX) 29.01.2008

11:50:01 CPU %user %nice %system %iowait %steal %idle
12:00:01 all 1,44 0,00 17,79 38,13 0,00 42,64
12:10:01 all 1,04 0,00 13,90 38,40 0,00 46,66
12:20:01 all 2,50 0,00 14,69 39,90 0,00 42,91
12:30:01 all 1,55 0,00 15,28 36,23 0,00 46,94
12:40:01 all 1,75 0,00 17,15 36,47 0,00 44,64
12:50:01 all 1,33 0,00 16,29 37,70 0,00 44,69
13:00:02 all 1,59 0,00 16,15 36,34 0,00 45,91
13:10:01 all 1,69 0,00 16,34 35,76 0,00 46,22
13:20:01 all 1,18 0,00 15,99 37,68 0,00 45,14
13:30:02 all 1,05 0,00 13,35 38,63 0,00 46,97
13:40:01 all 2,71 0,00 18,01 31,93 0,00 47,35
13:50:01 all 3,75 0,00 15,96 32,03 0,00 48,26
14:00:01 all 2,03 0,00 16,36 33,45 0,00 48,17
14:10:01 all 3,40 0,00 18,35 33,36 0,00 44,89
14:20:02 all 2,68 0,00 16,48 35,48 0,00 45,36
14:30:01 all 1,93 0,00 15,84 34,69 0,00 47,53
14:40:01 all 0,80 0,00 6,46 8,21 0,00 84,53
14:50:01 all 1,24 0,00 6,71 3,46 0,00 88,60
15:00:01 all 1,70 0,00 5,25 3,80 0,00 89,25
15:10:01 all 1,53 0,00 11,26 4,50 0,00 82,71
15:20:01 all 0,66 0,00 2,39 1,00 0,00 95,95
15:30:01 all 0,79 0,00 7,52 1,00 0,00 90,69
Durchschn.: all 1,74 0,00 13,52 27,18 0,00 57,55

This should give you an impression how the health of your system look like. In this example you can see that the server has massive %iowait problems. Also you can see in the history that this problems were fixed sometime between 14:30h and 14:40h.

thbe blogs

Jan 29, 2008

Check your server health

No comments: