Results tagged “linux” from WHO IS 2HEI?
1. finding netcard driver by syslog
grep -i 'driver' /var/log/messages
or
dmesg | grep -i driver
2.lsmod
#try to find netcard type.
[root@2hei.net]# modinfo e1000
filename: /lib/modules/2.6.9-34.ELsmp/kernel/drivers/net/e1000/e1000.ko
parm: debug:Debug level (0=none,...,16=all)
version: 6.1.16-k3-NAPI 4BCC06D27AAC4C711223CC9
license: GPL
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <linux.nics@intel.com>
[root@2hei.net]# modinfo igb
filename: /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/net/igb/igb.ko
version: 1.3.16-k2
license: GPL
description: Intel(R) Gigabit Ethernet Network Driver
author: Intel Corporation, <e1000-devel@lists.sourceforge.net>
srcversion: 78555F0A019E05BADBD95AA
[root@2hei.net]# modinfo bonding
filename: /lib/modules/2.6.18-164.11.1.el5/kernel/drivers/net/bonding/bonding.ko
author: Thomas Davis, tadavis@lbl.gov and many others
description: Ethernet Channel Bonding Driver, v3.4.0
version: 3.4.0
license: GPL
srcversion: 7989A7EEF2EE7B5D78C0E79
depends: ipv6
vermagic: 2.6.18-164.11.1.el5 SMP mod_unload gcc-4.1
I’d like to know what could be causing such a problem?
CentOS 5.x Read-only file system
form http://www.cyberciti.biz/tips/linux-filesytem-goes-read-only.html:
a) Hardware problem / hard disk problem, check harddisk for errors.
b) High disk I/O aka busy I/O retry error can mark low level disk call as failed. This will force ext3 to go into read only mode.
c) High disk I/O on SAN
d) SAN is not configured properly for the path failover.
In all sort of problems ext3 goes read-only to protect the filesystem and further damage. If you are using VMWARE, check out official webpage to download SCSI patches or workaround for vmware problems.
So what could be causing the file system on Linux go read-only?
Apart from above generic problem, any other error can trigger filesystem on Linux go read only. I hope our reader / seasoned Linux admin can help to answer this question.
reboot my OS , file system recovery.
and you can try others:
mount -o remount,rw /
mount -o remount /
Nov 19 00:03:14 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81004f0c7380)
Nov 19 00:03:14 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:03:14 2hei.net kernel: command: Read(10): 28 00 00 26 15 d7 00 00 48 00
Nov 19 00:03:14 2hei.net kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Nov 19 00:03:14 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81004f0c7380)
Nov 19 00:03:24 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81004f0c7380)
Nov 19 00:03:24 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:03:24 2hei.net kernel: command: Test Unit Ready: 00 00 00 00 00 00
Nov 19 00:03:25 2hei.net kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Nov 19 00:03:25 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81004f0c7380)
Nov 19 00:03:25 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81006dce49c0)
Nov 19 00:03:25 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:03:25 2hei.net kernel: command: Write(10): 2a 00 04 c4 a5 f2 00 01 40 00
Nov 19 00:12:10 2hei.net kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81006dce49c0)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81006dce49c0)
Nov 19 00:12:11 2hei.net kernel: sd 0:0:0:0:
Nov 19 00:12:11 2hei.net kernel: command: Test Unit Ready: 00 00 00 00 00 00
Nov 19 00:12:11 2hei.net kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81006dce49c0)
Nov 19 00:12:11 2hei.net kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007f0d4980)
Linux 2hei.net.163.com 2.6.18-53.1.13.el5 #1 SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@2hei.net ~]# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: DF072A9844 Rev: HPD0
Type: Direct-Access ANSI SCSI revision: 05
[root@2hei.net ~]# modinfo mptsas
filename: /lib/modules/2.6.18-53.1.13.el5/kernel/drivers/message/fusion/mptsas.ko
version: 3.04.04
license: GPL
description: Fusion MPT SAS Host driver
author: LSI Logic Corporation
srcversion: D0507BBC402F30BCC5539C0
alias: pci:v00001000d00000062sv*sd*bc*sc*i*
alias: pci:v00001000d00000058sv*sd*bc*sc*i*
alias: pci:v00001000d00000056sv*sd*bc*sc*i*
alias: pci:v00001000d00000054sv*sd*bc*sc*i*
alias: pci:v00001000d00000050sv*sd*bc*sc*i*
depends: mptscsih,mptbase,scsi_mod,scsi_transport_sas
vermagic: 2.6.18-53.1.13.el5 SMP mod_unload gcc-4.1
parm: mpt_pt_clear: Clear persistency table: enable=1 (default=MPTSCSIH_PT_CLEAR=0) (int)
parm: max_lun: max lun, default=16895 (int)
Do you have acpi=off parameter passed to kernel? Try to set it on.
Also, do you use ReiserFS as filesystem?
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00715023&lang=en&cc=us&taskId=101&prodSeriesId=428936&prodTypeId=15351
http://h50176.www5.hp.com/support/292889-291/solve/108727.html
[local@2hei.net ~]$ su -
Password:
Your default context is root:system_r:unconfined_t.
Do you want to choose a different one? [n]
[root@2hei.net ~]# getenforce
Permissive
解决办法一:
[root@2hei.net ~]# vi /etc/pam.d/su
session required /lib/security/$ISA/pam_selinux.so open multiple
改成
session required /lib/security/$ISA/pam_selinux.so open
解决办法二:
vi /etc/selinux/config
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - SELinux is fully disabled.
将
SELINUX=permissive
改成
SELINUX=disabled
重启系统或者使用以下命令使SElinux生效
setenforce 0
使用getenforce查看结果是否生效
vi /etc/pam.d/system-auth
auth required /lib/security/$ISA/pam_tally.so onerr=fail no_magic_root
account required /lib/security/$ISA/pam_tally.so deny=5 no_magic_root reset
#说明
deny=5 : Deny access if tally for this user exceeds 3 times.
lock_time=180 : Always deny for 180 seconds after failed attempt. There is also unlock_time=n option. It allow access after n seconds after failed attempt. If this option is used the user will be locked out for the specified amount of time after he exceeded his maximum allowed attempts. Otherwise the account is locked until the lock is removed by a manual intervention of the system administrator.
magic_root : If the module is invoked by a user with uid=0 the counter is not incremented. The sys-admin should use this for user launched services, like su, otherwise this argument should be omitted.
no_magic_root : Avoid root account locking, if the module is invoked by a user with uid=0
会有日志记录被锁定的用户
/var/log/faillog
清除被锁定的用户
crontab -l
*/30 * * * * /sbin/pam_tally --reset
或者
*/30 * * * * faillog -r
date +%Y%m%d%H%M$s -d "3 days ago 1 hours ago 1 minutes ago" ;date +%Y%m%d%H%M$s
200904110904
200904141005
Freebsd中date使用
昨 天:date -v -24H "+%Y%m%d"
上一月:date -v -1m "+%Y%m" n
上一年:date -v -1y "+%Y"
备忘
/var/log/messages 有这样的错误提示
Mar 8 20:18:24 2hei-net kernel: oom-killer: gfp_mask=0x1d2
Mar 8 20:18:24 2hei-net kernel: Mem-info:
Mar 8 20:18:24 2hei-net kernel: Node 0 DMA per-cpu:
Mar 8 20:18:24 2hei-net kernel: cpu 0 hot: low 2, high 6, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 0 cold: low 0, high 2, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 1 hot: low 2, high 6, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 1 cold: low 0, high 2, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 2 hot: low 2, high 6, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 2 cold: low 0, high 2, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 3 hot: low 2, high 6, batch 1
Mar 8 20:18:24 2hei-net kernel: cpu 3 cold: low 0, high 2, batch 1
Mar 8 20:18:24 2hei-net kernel: Node 0 Normal per-cpu:
Mar 8 20:18:26 2hei-net kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 2 hot: low 32, high 96, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 2 cold: low 0, high 32, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 3 hot: low 32, high 96, batch 16
Mar 8 20:18:26 2hei-net kernel: cpu 3 cold: low 0, high 32, batch 16
Mar 8 20:18:26 2hei-net kernel: Node 0 HighMem per-cpu: empty
Mar 8 20:18:26 2hei-net kernel:
Mar 8 20:18:26 2hei-net kernel: Free pages: 17536kB (0kB HighMem)
Mar 8 20:18:26 2hei-net kernel: Active:257583 inactive:239023 dirty:0 writeback:0 unstable:0 free:4384 slab:3787 mapped:497010 pagetables:2846
Mar 8 20:18:26 2hei-net kernel: Node 0 DMA free:11832kB min:44kB low:88kB high:132kB active:0kB inactive:0kB present:16384kB pages_scanned:0 all_unreclaimable? yes
Mar 8 20:18:26 2hei-net kernel: protections[]: 0 0 0
Mar 8 20:18:26 2hei-net kernel: Node 0 Normal free:5704kB min:5720kB low:11440kB high:17160kB active:1029692kB inactive:956732kB present:2080416kB pages_scanned:3188856 all_unreclaimable? yes
Mar 8 20:18:26 2hei-net kernel: protections[]: 0 0 0
Mar 8 20:18:26 2hei-net kernel: Node 0 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Mar 8 20:18:26 2hei-net kernel: protections[]: 0 0 0
Mar 8 20:18:26 2hei-net kernel: Node 0 DMA: 6*4kB 4*8kB 2*16kB 3*32kB 2*64kB 2*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB 2*4096kB = 11832kB
Mar 8 20:18:26 2hei-net kernel: Node 0 Normal: 0*4kB 1*8kB 4*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 5704kB
Mar 8 20:18:26 2hei-net kernel: Node 0 HighMem: empty
Mar 8 20:18:26 2hei-net kernel: Swap cache: add 1070476, delete 1070476, find 119872/179715, race 0+20
Mar 8 20:18:26 2hei-net kernel: Free swap: 0kB
Mar 8 20:18:26 2hei-net kernel: 524200 pages of RAM
Mar 8 20:18:27 2hei-net kernel: 10214 reserved pages
Mar 8 20:18:27 2hei-net kernel: 67015 pages shared
Mar 8 20:18:27 2hei-net kernel: 0 pages swap cached
Mar 8 20:18:27 2hei-net kernel: Out of Memory: Killed process 26079 (httpd).
终于发现了linux的OOM Killer(Out of Memory: Killed process)这个功能。当linux发现有进程占用内存过多时会触发OOM Killer功能,将占用内存最多的pid给杀掉,通过网上的一些惨痛的教训可以看到有mysql、oracle、apache给OOM kill掉的。
因为我的机器上只跑了apache服务,所以又看了一下我的apache配置
<IfModule worker.c>
StartServers 2
MaxClients 2048
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 128
MaxRequests
本以为想让apache支持的连接数多一点,没想到MaxClients参数的设置影响了系统的稳定。
我看到以这种配置启动时VIRT RES的值就已经显示为1702,RES从11m开始逐步升高。
PID USER PR NI %CPU TIME+ %MEM VIRT RES SHR S COMMAND
7009 apache 16 0 1 0:06.95 0.8 1702m 11m 2172 S /home/local/apache/bin/httpd
跑了几天以后VIRT已经到了2G多,RES也接近1G,终于今天挂掉了,httpd进程被linux给kill掉了。
于是尝试修改apache的httd.conf配置,发现MaxClients为1536时,启动VIRT可以达到1024m,如果设定为1024时 VIRT为776m,所以对于2G内存的机器不要超过1536为好。
参考文档:
http://lwn.net/Articles/104179/
fping 是linux下不错的网络测评工具,相信大家都很熟悉,不多说了,仅作为收藏、备忘。
./fping --help
./fping: invalid option -- -
Usage: ./fping [options] [targets...]
-a show targets that are alive
-A show targets by address
-b n amount of ping data to send, in bytes (default 56)
-B f set exponential backoff factor to f
-c n count of pings to send to each target (default 1)
-C n same as -c, report results in verbose format
-e show elapsed time on return packets
-f file read list of targets from a file ( - means stdin) (only if no -g specified)
-g generate target list (only if no -f specified)
(specify the start and end IP in the target list, or supply a IP netmask)
(ex. ./fping -g 192.168.1.0 192.168.1.255 or ./fping -g 192.168.1.0/24)
-i n interval between sending ping packets (in millisec) (default 25)
-l loop sending pings forever
-m ping multiple interfaces on target host
-n show targets by name (-d is equivalent)
-p n interval between ping packets to one target (in millisec)
(in looping and counting modes, default 1000)
-q quiet (don't show per-target/per-ping results)
-Q n same as -q, but show summary every n seconds
-r n number of retries (default 3)
-s print final stats
-t n individual target initial timeout (in millisec) (default 500)
-u show targets that are unreachable
-v show version
targets list of targets to check (if no -f specified)
批量ping主机列表
cat host_list
192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5
./fping -f host_list
或者
使用./fping -g 192.168.1.0/24
收到服务器报警,查看/var/log/messages 中很多类似下面的日志:
Dec 9 00:03:22 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ec8bb00)
Dec 9 00:03:22 kernel: sd 0:0:0:0:
Dec 9 00:03:22 kernel: command: Read(10): 28 00 05 4c 03 6a 00 01 00 00
Dec 9 00:03:23 kernel: mptbase: ioc0: LogInfo(0x31140000): Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Dec 9 00:03:23 kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ec8bb00)
Dec 9 00:03:33 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff81007ec8bb00)
Dec 9 00:03:33 kernel: sd 0:0:0:0:
Dec 9 00:03:33 kernel: command: Test Unit Ready: 00 00 00 00 00 00
Dec 9 00:03:33 kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000)
Dec 9 00:03:33 kernel: mptscsih: ioc0: task abort: SUCCESS (sc=ffff81007ec8bb00)
Dec 9 00:03:33 kernel: mptscsih: ioc0: attempting task abort! (sc=ffff810077e51380)
Dec 9 00:03:33 kernel: sd 0:0:0:0:
系统状态:
uname -a
Linux 2.6.18-53.1.13.el5 #1 SMP Tue Feb 12 13:02:30 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release
CentOS release 5 (Final)
#smartctl -a /dev/sda
smartctl version 5.36 [x86_64-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
Device: HP DF072A9844 Version: HPD0
Serial number: DQA2P6B00GMC0648
Device type: disk
Transport protocol: SAS
Local Time is: Tue Dec 9 09:33:35 2008 CST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Current Drive Temperature: 33 C
Drive Trip Temperature: 65 C
Manufactured in week 48 of year 2006
Current start stop count: 6 times
Recommended maximum start stop count: 10000 times
Elements in grown defect list: 0
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 0 0.000 0
write: 0 0 0 0 0 0.000 0
Non-medium error count: 2293
No self-tests have been logged
Long (extended) Self Test duration: 1815 seconds [30.2 minutes]
找了半天并没有看出有什么异常来,网上有些资料说是操作系统内核对SAS硬盘的支持不好,也有把责任归咎于dell服务器,但是我的机器可是HP滴!
HP官方站上找到了一点信息 是关于磁带备份操作中的注意事项。
再观察几天,也继续搜索一下解决方案,如果还没有结果的话,我准备重做系统了,降低内核版本,装回32bit再试试看。
一直使用resin2.x版本,不过一直没有太多深入研究,最近在升级resin3版本时遇到很多问题,无意中发现一些大的网站后台也在使用resin,具体是谁我这里就不详细说明了,不过看到的都是比较老的版本,2.1.3甚至更老的2.0.4,配置了负载,并且与apache结合使用。
下面是从resin3文档在linux下使用时总结的一些东西:
Deployment Directories When deploying, it's a good idea to create a bit of structure to make Resin and website upgrades easier and more maintainable.
1、Create a user to run Resin (e.g. resin or another non-root user)
2、Link /usr/local/share/resin to the current Resin version. This is $RESIN_HOME.
3、Create a deployment root, e.g. /var/www, owned by the resin user. This is $RESIN_ROOT.
4、Put the modified resin.conf in /etc/resin/resin.conf
5、Put the site documents in /var/www/webapps/ROOT.
6、Put any .war files in /var/www/webapps.
7、Put any virtual hosts in /var/www/hosts/www.foo.com.
8、Output logs will appear in /var/www/log.
9、Create a startup script and configure the server to start it when the machine reboots.
10、start/stop and the watchdog process
unix> java -jar /usr/local/share/resin/lib/resin.jar \
-conf /etc/resin/resin.conf \
start
Configure the location of Java in JAVA_HOME
Configure the location of Resin in RESIN_HOME
Configure your web site directory in RESIN_ROOT
Select a server and pid file if you have multiple Resin servers. Start and stop the Resin daemon.
当然你可以自己写启动resin的脚本,也可以让resin跟随系统启动而一起启动。




