Results tagged “linux” from WHO IS 2HEI?

we have two ways:

1. finding netcard driver by syslog
grep -i 'driver' /var/log/messages
or
dmesg | grep -i driver

2.lsmod  
#try to find netcard type.

[root@2hei.net]# modinfo e1000
filename:       /lib/modules/2.6.9-34.ELsmp/kernel/drivers/net/e1000/e1000.ko
parm:           debug:Debug level (0=none,...,16=all)
version:        6.1.16-k3-NAPI 4BCC06D27AAC4C711223CC9
license:        GPL
description:    Intel(R) PRO/1000 Network Driver
author:         Intel Corporation, <linux.nics@intel.com>


[root@2hei.net]# modinfo igb
filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/net/igb/igb.ko
version:        1.3.16-k2
license:        GPL
description:    Intel(R) Gigabit Ethernet Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>
srcversion:     78555F0A019E05BADBD95AA

[root@2hei.net]# modinfo bonding
filename:       /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/net/bonding/bonding.ko
author:         Thomas Davis, tadavis@lbl.gov and many others
description:    Ethernet Channel Bonding Driver, v3.4.0
version:        3.4.0
license:        GPL
srcversion:     7989A7EEF2EE7B5D78C0E79
depends:        ipv6
vermagic:       2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
| | Comments (1) | TrackBacks (0)
cat /proc/mdstat

Personalities : 
[raid1] read_ahead 1024 sectors 
md1 : active raid1 sda3[0] sdb3[1] 522048 blocks [2/2] [U_] 
md0 : active raid1 sda2[0] sdb2[1] 4192896 blocks [2/2] [U_] 
md2 : active raid1 sda1[0] sdb1[1] 128384 blocks [2/2] [U_] 
unused devices: <none> 

this shows disk hdb failed!  we will replace it.

work follow it:

Replacing A Failed Hard Drive In A Software RAID1 Array
Version 1.0 
Author: Falko Timme <ft [at] falkotimme [dot] com> 
Last edited 01/21/2007

This guide shows how to remove a failed hard drive from a Linux RAID1 array (software RAID), and how to add a new hard disk to the RAID1 array without losing data.

I do not issue any guarantee that this will work for you!

 

1 Preliminary Note
In this example I have two hard drives, /dev/sda and /dev/sdb, with the partitions /dev/sda1 and /dev/sda2 as well as /dev/sdb1 and /dev/sdb2.

/dev/sda1 and /dev/sdb1 make up the RAID1 array /dev/md0.

/dev/sda2 and /dev/sdb2 make up the RAID1 array /dev/md1.

/dev/sda1 + /dev/sdb1 = /dev/md0

/dev/sda2 + /dev/sdb2 = /dev/md1

/dev/sdb has failed, and we want to replace it.

 

2 How Do I Tell If A Hard Disk Has Failed?
If a disk has failed, you will probably find a lot of error messages in the log files, e.g. /var/log/messages or /var/log/syslog.
You can also run
cat /proc/mdstat
and instead of the string [UU] you will see [U_] if you have a degraded RAID1 array.
 

3 Removing The Failed Disk
To remove /dev/sdb, we will mark /dev/sdb1 and /dev/sdb2 as failed and remove them from their respective RAID arrays (/dev/md0 and /dev/md1).
First we mark /dev/sdb1 as failed:
mdadm --manage /dev/md0 --fail /dev/sdb1
The output of
cat /proc/mdstat
should look like this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[2](F)
      24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]

unused devices: <none>

Then we remove /dev/sdb1 from /dev/md0:

mdadm --manage /dev/md0 --remove /dev/sdb1

The output should be like this:

server1:~# mdadm --manage /dev/md0 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1

And

cat /proc/mdstat

should show this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]

unused devices: <none>

Now we do the same steps again for /dev/sdb2 (which is part of /dev/md1):

mdadm --manage /dev/md1 --fail /dev/sdb2

cat /proc/mdstat

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[2](F)
      24418688 blocks [2/1] [U_]

unused devices: <none>


mdadm --manage /dev/md1 --remove /dev/sdb2

server1:~# mdadm --manage /dev/md1 --remove /dev/sdb2
mdadm: hot removed /dev/sdb2

cat /proc/mdstat

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0]
      24418688 blocks [2/1] [U_]

md1 : active raid1 sda2[0]
      24418688 blocks [2/1] [U_]

unused devices: <none>

Then power down the system:

shutdown -h now

and replace the old /dev/sdb hard drive with a new one (it must have at least the same size as the old one - if it's only a few MB smaller than the old one then rebuilding the arrays will fail).

 
4 Adding The New Hard Disk
After you have changed the hard disk /dev/sdb, boot the system.
The first thing we must do now is to create the exact same partitioning as on /dev/sda. We can do this with one simple command:
sfdisk -d /dev/sda | sfdisk /dev/sdb
You can run
fdisk -l
to check if both hard drives have the same partitioning now.
Next we add /dev/sdb1 to /dev/md0 and /dev/sdb2 to /dev/md1:
mdadm --manage /dev/md0 --add /dev/sdb1

server1:~# mdadm --manage /dev/md0 --add /dev/sdb1
mdadm: re-added /dev/sdb1
mdadm --manage /dev/md1 --add /dev/sdb2
server1:~# mdadm --manage /dev/md1 --add /dev/sdb2
mdadm: re-added /dev/sdb2

Now both arays (/dev/md0 and /dev/md1) will be synchronized. Run

cat /proc/mdstat
to see when it's finished.
During the synchronization the output will look like this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
      24418688 blocks [2/1] [U_]
      [=>...................]  recovery =  9.9% (2423168/24418688) finish=2.8min speed=127535K/sec

md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/1] [U_]
      [=>...................]  recovery =  6.4% (1572096/24418688) finish=1.9min speed=196512K/sec

unused devices: <none>
When the synchronization is finished, the output will look like this:

server1:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid5] [raid4] [raid6] [raid10]
md0 : active raid1 sda1[0] sdb1[1]
      24418688 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      24418688 blocks [2/2] [UU]

unused devices: <none>

That's it, you have successfully replaced /dev/sdb!
| | Comments (0) | TrackBacks (0)
虚拟机上最小化安装centos5.3,传文件时发现非常方便的rz 和sz没有安装上,于是手工安装。

wget http://mirrors.163.com/centos/5/os/i386/CentOS/lrzsz-0.12.20-22.1.i386.rpm
(一个好消息是163和sohu都有了开源镜像哦,速度飞快!)

rpm ivh lrzsz-0.12.20-22.1.i386.rpm

或者直接用yum安装。
yum install lrzsz

then in your securecrt or xshell terminal:

sz filename
rz
 

| | Comments (0) | TrackBacks (0)
We have 5 Dell server collocated running CentOS 4.x and 5.x server operating system. Sometime my file system (ext3) goes read-only.
I’d like to know what could be causing such a problem?

CentOS 5.x Read-only file system

form http://www.cyberciti.biz/tips/linux-filesytem-goes-read-only.html:
a) Hardware problem / hard disk problem, check harddisk for errors.

b) High disk I/O aka busy I/O retry error can mark low level disk call as failed. This will force ext3 to go into read only mode.

c) High disk I/O on SAN

d) SAN is not configured properly for the path failover.

In all sort of problems ext3 goes read-only to protect the filesystem and further damage. If you are using VMWARE, check out official webpage to download SCSI patches or workaround for vmware problems.
So what could be causing the file system on Linux go read-only?

Apart from above generic problem, any other error can trigger filesystem on Linux go read only. I hope our reader / seasoned Linux admin can help to answer this question.

reboot my OS , file system recovery.

and you can try others:
mount -o remount,rw /
mount -o remount /
| | Comments (0) | TrackBacks (0)

Nov 19 00:03:14 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81004f0c7380)
Nov 19 00:03:14 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:03:14 2hei.net kernel: command: Read(10): 28 00 00 26 15 d7 00 00 48 00
Nov 19 00:03:14 2hei.net kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Nov 19 00:03:14 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81004f0c7380)
Nov 19 00:03:24 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81004f0c7380)
Nov 19 00:03:24 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:03:24 2hei.net kernel: command: Test Unit Ready: 00 00 00 00 00 00
Nov 19 00:03:25 2hei.net kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Nov 19 00:03:25 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81004f0c7380)
Nov 19 00:03:25 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81006dce49c0)
Nov 19 00:03:25 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:03:25 2hei.net kernel: command: Write(10): 2a 00 04 c4 a5 f2 00 01 40 00
Nov 19 00:12:10 2hei.net kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81006dce49c0)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81006dce49c0)
Nov 19 00:12:11 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:12:11 2hei.net kernel: command: Test Unit Ready: 00 00 00 00 00 00
Nov 19 00:12:11 2hei.net kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81006dce49c0)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007f0d4980)



Linux 2hei.net.163.com 2.6.18-53.1.13.el5 #1 SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

[root@2hei.net ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: DF072A9844 Rev: HPD0
Type: Direct-Access ANSI SCSI revision: 05

[root@2hei.net ~]# modinfo mptsas
filename: /lib/modules/2.6.18-53.1.13.el5/kernel/drivers/message/fusion/mptsas.ko
version: 3.04.04
license: GPL
description: Fusion MPT SAS Host driver
author: LSI Logic Corporation
srcversion: D0507BBC402F30BCC5539C0
alias: pci:v00001000d00000062sv*sd*bc*sc*i*
alias: pci:v00001000d00000058sv*sd*bc*sc*i*
alias: pci:v00001000d00000056sv*sd*bc*sc*i*
alias: pci:v00001000d00000054sv*sd*bc*sc*i*
alias: pci:v00001000d00000050sv*sd*bc*sc*i*
depends: mptscsih,mptbase,scsi_mod,scsi_transport_sas
vermagic: 2.6.18-53.1.13.el5 SMP mod_unload gcc-4.1
parm: mpt_pt_clear: Clear persistency table: enable=1 (default=MPTSCSIH_PT_CLEAR=0) (int)
parm: max_lun: max lun, default=16895 (int)


Do you have acpi=off parameter passed to kernel? Try to set it on.
Also, do you use ReiserFS as filesystem?

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00715023&lang=en&cc=us&taskId=101&prodSeriesId=428936&prodTypeId=15351
http://h50176.www5.hp.com/support/292889-291/solve/108727.html

| | Comments (0) | TrackBacks (0)
使用ssh或者telnet登陆到终端的时候,发现有的机器会有如下的提示:
[local@2hei.net ~]$ su -
Password:
Your default context is root:system_r:unconfined_t.

Do you want to choose a different one? [n]

[root@2hei.net ~]# getenforce
Permissive

解决办法一:
[root@2hei.net ~]#  vi /etc/pam.d/su  

session           required     /lib/security/$ISA/pam_selinux.so open multiple
改成
session           required     /lib/security/$ISA/pam_selinux.so open

解决办法二:
vi /etc/selinux/config
# SELINUX= can take one of these three values:
#       enforcing - SELinux security policy is enforced.
#       permissive - SELinux prints warnings instead of enforcing.
#       disabled - SELinux is fully disabled.

SELINUX=permissive
改成
SELINUX=disabled

重启系统或者使用以下命令使SElinux生效
setenforce 0

使用getenforce查看结果是否生效

| | Comments (0) | TrackBacks (0)
设置方式:
vi /etc/pam.d/system-auth

auth        required      /lib/security/$ISA/pam_tally.so onerr=fail no_magic_root
account     required      /lib/security/$ISA/pam_tally.so deny=5 no_magic_root reset

#说明
deny=5 : Deny access if tally for this user exceeds 3 times.
lock_time=180 : Always deny for 180 seconds after failed attempt. There is also unlock_time=n option. It allow access after n seconds after failed attempt. If this option is used the user will be locked out for the specified amount of time after he exceeded his maximum allowed attempts. Otherwise the account is locked until the lock is removed by a manual intervention of the system administrator.
magic_root : If the module is invoked by a user with uid=0 the counter is not incremented. The sys-admin should use this for user launched services, like su, otherwise this argument should be omitted.
no_magic_root : Avoid root account locking, if the module is invoked by a user with uid=0

会有日志记录被锁定的用户
/var/log/faillog

清除被锁定的用户
crontab -l
*/30 * * * * /sbin/pam_tally --reset
或者
*/30 * * * * faillog -r
| | Comments (0) | TrackBacks (0)
Linux下 取得前3天前1小时前一分钟的时间(BT想法)
date +%Y%m%d%H%M$s -d "3 days ago 1 hours ago 1 minutes ago" ;date +%Y%m%d%H%M$s
200904110904
200904141005

Freebsd中date使用
昨   天:date -v -24H "+%Y%m%d"     
上一月:date  -v  -1m  "+%Y%m" n
上一年:date  -v  -1y  "+%Y"

备忘
| | Comments (0) | TrackBacks (0)
apache 终于停止服务了,系统为2.6内核、64位操作系统、2G内存。
/var/log/messages 有这样的错误提示
Mar  8 20:18:24 2hei-net kernel: oom-killer: gfp_mask=0x1d2
Mar  8 20:18:24 2hei-net kernel: Mem-info:
Mar  8 20:18:24 2hei-net kernel: Node 0 DMA per-cpu:
Mar  8 20:18:24 2hei-net kernel: cpu 0 hot: low 2, high 6, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 0 cold: low 0, high 2, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 1 hot: low 2, high 6, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 1 cold: low 0, high 2, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 2 hot: low 2, high 6, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 2 cold: low 0, high 2, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 3 hot: low 2, high 6, batch 1
Mar  8 20:18:24 2hei-net kernel: cpu 3 cold: low 0, high 2, batch 1
Mar  8 20:18:24 2hei-net kernel: Node 0 Normal per-cpu:
Mar  8 20:18:26 2hei-net kernel: cpu 0 hot: low 32, high 96, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 0 cold: low 0, high 32, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 1 hot: low 32, high 96, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 1 cold: low 0, high 32, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 2 hot: low 32, high 96, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 2 cold: low 0, high 32, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 3 hot: low 32, high 96, batch 16
Mar  8 20:18:26 2hei-net kernel: cpu 3 cold: low 0, high 32, batch 16
Mar  8 20:18:26 2hei-net kernel: Node 0 HighMem per-cpu: empty
Mar  8 20:18:26 2hei-net kernel:
Mar  8 20:18:26 2hei-net kernel: Free pages:       17536kB (0kB HighMem)
Mar  8 20:18:26 2hei-net kernel: Active:257583 inactive:239023 dirty:0 writeback:0 unstable:0 free:4384 slab:3787 mapped:497010 pagetables:2846
Mar  8 20:18:26 2hei-net kernel: Node 0 DMA free:11832kB min:44kB low:88kB high:132kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? yes
Mar  8 20:18:26 2hei-net kernel: protections[]: 0 0 0
Mar  8 20:18:26 2hei-net kernel: Node 0 Normal free:5704kB min:5720kB low:11440kB high:17160kB active:1029692kB inactive:956732kB present:2080416kB pages_scanned:3188856 all_unreclaimable? yes
Mar  8 20:18:26 2hei-net kernel: protections[]: 0 0 0
Mar  8 20:18:26 2hei-net kernel: Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Mar  8 20:18:26 2hei-net kernel: protections[]: 0 0 0
Mar  8 20:18:26 2hei-net kernel: Node 0 DMA: 6*4kB 4*8kB 2*16kB 3*32kB 2*64kB 2*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB 2*4096kB = 11832kB
Mar  8 20:18:26 2hei-net kernel: Node 0 Normal: 0*4kB 1*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 5704kB
Mar  8 20:18:26 2hei-net kernel: Node 0 HighMem: empty
Mar  8 20:18:26 2hei-net kernel: Swap cache: add 1070476, delete 1070476, find 119872/179715, race 0+20
Mar  8 20:18:26 2hei-net kernel: Free swap:            0kB
Mar  8 20:18:26 2hei-net kernel: 524200 pages of RAM
Mar  8 20:18:27 2hei-net kernel: 10214 reserved pages
Mar  8 20:18:27 2hei-net kernel: 67015 pages shared
Mar  8 20:18:27 2hei-net kernel: 0 pages swap cached
Mar  8 20:18:27 2hei-net kernel: Out of Memory: Killed process 26079 (httpd).

终于发现了linux的OOM Killer(Out of Memory: Killed process)这个功能。当linux发现有进程占用内存过多时会触发OOM Killer功能,将占用内存最多的pid给杀掉,通过网上的一些惨痛的教训可以看到有mysql、oracle、apache给OOM kill掉的。

因为我的机器上只跑了apache服务,所以又看了一下我的apache配置
<IfModule worker.c>
StartServers         2
MaxClients         2048
MinSpareThreads     25
MaxSpareThreads     75
ThreadsPerChild     128
MaxRequests
本以为想让apache支持的连接数多一点,没想到MaxClients参数的设置影响了系统的稳定。

我看到以这种配置启动时VIRT  RES的值就已经显示为1702,RES从11m开始逐步升高。
 PID USER      PR  NI %CPU    TIME+  %MEM  VIRT  RES  SHR S COMMAND                                                                
 7009 apache    16   0    1   0:06.95  0.8  1702m  11m 2172 S /home/local/apache/bin/httpd                            
跑了几天以后VIRT已经到了2G多,RES也接近1G,终于今天挂掉了,httpd进程被linux给kill掉了。

于是尝试修改apache的httd.conf配置,发现MaxClients为1536时,启动VIRT可以达到1024m,如果设定为1024时 VIRT为776m,所以对于2G内存的机器不要超过1536为好。
参考文档:
http://lwn.net/Articles/104179/


| | Comments (0) | TrackBacks (0)

fping 是linux下不错的网络测评工具,相信大家都很熟悉,不多说了,仅作为收藏、备忘。

http://fping.sourceforge.net/

./fping --help
./fping: invalid option -- -

Usage: ./fping [options] [targets...]
   -a         show targets that are alive
   -A         show targets by address
   -b n       amount of ping data to send, in bytes (default 56)
   -B f       set exponential backoff factor to f
   -c n       count of pings to send to each target (default 1)
   -C n       same as -c, report results in verbose format
   -e         show elapsed time on return packets
   -f file    read list of targets from a file ( - means stdin) (only if no -g specified)
   -g         generate target list (only if no -f specified)
                (specify the start and end IP in the target list, or supply a IP netmask)
                (ex. ./fping -g 192.168.1.0 192.168.1.255 or ./fping -g 192.168.1.0/24)
   -i n       interval between sending ping packets (in millisec) (default 25)
   -l         loop sending pings forever
   -m         ping multiple interfaces on target host
   -n         show targets by name (-d is equivalent)
   -p n       interval between ping packets to one target (in millisec)
                (in looping and counting modes, default 1000)
   -q         quiet (don't show per-target/per-ping results)
   -Q n       same as -q, but show summary every n seconds
   -r n       number of retries (default 3)
   -s         print final stats
   -t n       individual target initial timeout (in millisec) (default 500)
   -u         show targets that are unreachable
   -v         show version
   targets    list of targets to check (if no -f specified)

批量ping主机列表

cat host_list

192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5

./fping -f host_list
或者

使用./fping -g 192.168.1.0/24

| | Comments (0) | TrackBacks (0)

收到服务器报警,查看/var/log/messages 中很多类似下面的日志:
Dec  9 00:03:22 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ec8bb00)
Dec  9 00:03:22 kernel: sd 0:0:0:0:
Dec  9 00:03:22 kernel:         command: Read(10): 28 00 05 4c 03 6a 00 01 00 00
Dec  9 00:03:23 kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Dec  9 00:03:23 kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ec8bb00)
Dec  9 00:03:33 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ec8bb00)
Dec  9 00:03:33 kernel: sd 0:0:0:0:
Dec  9 00:03:33 kernel:         command: Test Unit Ready: 00 00 00 00 00 00
Dec  9 00:03:33 kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Dec  9 00:03:33 kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ec8bb00)
Dec  9 00:03:33 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810077e51380)
Dec  9 00:03:33 kernel: sd 0:0:0:0:

系统状态:
uname -a
Linux 2.6.18-53.1.13.el5 #1 SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/redhat-release
CentOS release 5 (Final)

#smartctl -a /dev/sda
smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: HP       DF072A9844       Version: HPD0
Serial number: DQA2P6B00GMC0648
Device type: disk
Transport protocol: SAS
Local Time is: Tue Dec  9 09:33:35 2008 CST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     33 C
Drive Trip Temperature:        65 C
Manufactured in week 48 of year 2006
Current start stop count:      6 times
Recommended maximum start stop count:  10000 times
Elements in grown defect list: 0

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0          0.000           0
write:         0        0         0         0          0          0.000           0

Non-medium error count:     2293
No self-tests have been logged
Long (extended) Self Test duration: 1815 seconds [30.2 minutes]

找了半天并没有看出有什么异常来,网上有些资料说是操作系统内核对SAS硬盘的支持不好,也有把责任归咎于dell服务器,但是我的机器可是HP滴!
HP官方站上找到了一点信息  是关于磁带备份操作中的注意事项。

再观察几天,也继续搜索一下解决方案,如果还没有结果的话,我准备重做系统了,降低内核版本,装回32bit再试试看。



| | Comments (0) | TrackBacks (0)

一直使用resin2.x版本,不过一直没有太多深入研究,最近在升级resin3版本时遇到很多问题,无意中发现一些大的网站后台也在使用resin,具体是谁我这里就不详细说明了,不过看到的都是比较老的版本,2.1.3甚至更老的2.0.4,配置了负载,并且与apache结合使用。

下面是从resin3文档在linux下使用时总结的一些东西:

Deployment Directories When deploying, it's a good idea to create a bit of structure to make Resin and website upgrades easier and more maintainable.

1、Create a user to run Resin (e.g. resin or another non-root user)

2、Link /usr/local/share/resin to the current Resin version. This is $RESIN_HOME.

3、Create a deployment root, e.g. /var/www, owned by the resin user. This is $RESIN_ROOT.

4、Put the modified resin.conf in /etc/resin/resin.conf

5、Put the site documents in /var/www/webapps/ROOT.

6、Put any .war files in /var/www/webapps.

7、Put any virtual hosts in /var/www/hosts/www.foo.com.

8、Output logs will appear in /var/www/log.

9、Create a startup script and configure the server to start it when the machine reboots.

10、start/stop and the watchdog process

unix> java -jar /usr/local/share/resin/lib/resin.jar \

         -conf /etc/resin/resin.conf \

         start

Configure the location of Java in JAVA_HOME

Configure the location of Resin in RESIN_HOME

Configure your web site directory in RESIN_ROOT

Select a server and pid file if you have multiple Resin servers. Start and stop the Resin daemon.

当然你可以自己写启动resin的脚本,也可以让resin跟随系统启动而一起启动。

| | Comments (0) | TrackBacks (0)

标签