1、 技术文档模板(v 100605)作者王卫锋审核分类Sun子类OS网络更新时间2010-6-5关键字Solaris 10、IPMP、Kernel Patches 141444-09 and 141445-09摘要Bug Id :6888928,在Solaris 10环境下,kernel为141444-09(SPARC)和141445-09(x86),IPMP配置为Probe-Based模式,会导致副网卡严重丢包,IPMP切换后,浮动IP无法ping通主要适用环境Solaris 10,kernel为141444-09(SPARC)和141445-09(x86)版本说明版本号拟制/修改责任人拟制/
2、修改日期修改内容/理由V100605王卫锋2010-6-5新建目录版本说明21 Bug简介32 故障成因说明42.1如何判断基于Probe-Based模式的IPMP配置43 故障表现53.1使用snoop查看数据包54解决方案61 Bug简介Bug Id:6888928,在Solaris 10,Kernel Patches为141444-09(SPARC)和141445-09(x86)环境下,IPMP如果采用Probe-Based模式,会导致网卡failure,从而影响浮动IP不能正常通讯;目前这个Bug对IPMP采用Link-Based模式的网卡没有影响;2 故障成因说明SPARC平台:So
3、laris 10,打了kernel patch 141444-09,还没有打142900-02X86平台:Solaris 10,打了kernel patch 141445-09,还没有打142901-02对于Solaris 8、Solaris 9和OpenSolaris不受此问题影响;只有IPMP采用Probe-Based模式产生此问题,对于采用Link-Based模式不受此问题影响;2.1如何判断基于Probe-Based模式的IPMP配置必须符合以下的条件:1 运行以下命令后查看in.mpathd守护进程是否运行# ps -aef |grep in.mpathdroot 211 1 0 1
4、1:04:51 ? 0:00 /usr/lib/inet/in.mpathd -a2 使用ifconfig -a查看网卡groupname必须是同一个IPMP group # ifconfig -a lo0: flags=2001000849 mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g1: flags=1000843 mtu 1500 index 5 inet 192.178.100.1 netmask ffffff00 broadcast 192.178.100.255 groupname fred ether 0:3:ba
5、:d8:d1:ef e1000g1:1: flags=9040843 mtu 1500 index 5 inet 192.178.100.2 netmask ffffff00 broadcast 192.178.100.255 e1000g2: flags=1000843 mtu 1500 index 6 inet 192.178.100.5 netmask ffffff00 broadcast 192.178.100.255 groupname fred ether 0:4:23:c8:33:86 e1000g2:1: flags=9040843 mtu 1500 index 6 inet
6、192.178.100.6 netmask ffffff00 broadcast 192.178.100.255以上的例子中,groupname为fred3 使用ifconfig -a查看测试地址必须是DEPRECATED 和 NOFAILOVER的状态3 故障表现以上描述的问题出现时,纵使没有网络问题发生,也会导致IPMP group中的网卡failure;下面是/var/adm/messages中的log信息:# Oct 22 11:09:29 v4v-t2000a-sca11 in.mpathd211: NIC failure detected on e1000g2 of group f
7、red Oct 22 11:09:29 v4v-t2000a-sca11 in.mpathd211: Successfully failed over from NIC e1000g2 to NIC e1000g1使用ifconfig -a查看网卡被标记为FAILED:# ifconfig -a lo0: flags=2001000849 mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 e1000g1: flags=1000843 mtu 1500 index 5 inet 192.178.100.1 netmask ffffff00 broa
8、dcast 192.178.100.255 groupname fred ether 0:3:ba:d8:d1:ef e1000g1:1: flags=9040843 mtu 1500 index 5 inet 192.178.100.2 netmask ffffff00 broadcast 192.178.100.255 e1000g1:2: flags=1000843 mtu 1500 index 5 inet 192.178.100.5 netmask ffffff00 broadcast 192.178.100.255 e1000g2: flags=19000842 mtu 0 ind
9、ex 6 inet 0.0.0.0 netmask 0 groupname fred ether 0:4:23:c8:33:86e1000g2:1:flags=19040843 mtu 1500 index 6 inet 192.178.100.6 netmask ffffff00 broadcast 192.178.100.2553.1使用snoop查看数据包 由于e1000g1代替e1000g2接收了回复给192.168.100.6这个地址的ICMP数据包,导致e1000g2网卡failed;使用snoop命令可以看到每个192.178.100.6地址的ICMP请求数据包都由e1000g1
10、发出,回复给192.178.100.6的数据包也由e1000g1收到:# snoop -d e1000g1 icmp Using device e1000g1 (promiscuous mode) 192.178.100.6 - 192.178.100.15 ICMP Echo request (ID: 54022 Sequence number: 1674) 192.178.100.15 - 192.178.100.6 ICMP Echo reply (ID: 54022 Sequence number: 1674) 192.178.100.2 - 192.178.100.15 ICMP E
11、cho request (ID: 54021 Sequence number: 1680) 192.178.100.15 - 192.178.100.2 ICMP Echo reply (ID: 54021 Sequence number: 1680) 192.178.100.6 - 192.178.100.10 ICMP Echo request (ID: 54022 Sequence number: 1675)192.178.100.10 - 192.178.100.6 ICMP Echo reply (ID: 54022 Sequence number: 1675)再用snoop命令查看
12、e1000g2的数据包,没有数据包收发:# snoop -d e1000g2 icmpUsing device e1000g2 (promiscuous mode)4解决方案1 打kernel patchSPARC平台:Solaris 10打patch至142900-02或更高X86平台:Solaris 10打patch至142901-02或更高2 IPMP改为Link-Based模式# more /etc/hostname.e1000g1192.178.100.1 netmask + broadcast + group fred up# more /etc/hostname.e1000g2192.178.100.5 netmask + broadcast + deprecated -failover group fred up技术文档第 6页 共6页