通过ssh登陆一台linux主机非常慢,基本上每次都在10s以上

[root@localhost ~]# time ssh 192.168.146.104

real	0m13.730s
user	0m0.014s
sys	0m0.009s

通过ssh -vvv查看建立链接的过程中详细日志,找出最慢的地方出现在哪里

[root@localhost ~]# ssh -vvv 192.168.146.104
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.146.104 [192.168.146.104] port 22.
debug1: Connection established.		 	<<== 说明发起的连接已经建立成功  
debug1: permanently_set_uid: 0/0
debug1: identity file /root/.ssh/identity type -1
debug1: identity file /root/.ssh/identity-cert type -1
...
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXINIT sent
debug3: Wrote 960 bytes for a total of 981
debug1: SSH2_MSG_KEXINIT received
...
debug2: mac_setup: found hmac-sha1
debug1: kex: server->client aes128-ctr hmac-sha1 none
debug2: mac_setup: found hmac-sha1
debug1: kex: client->server aes128-ctr hmac-sha1 none
debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<2048<8192) sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
debug3: Wrote 24 bytes for a total of 1005
debug2: dh_gen_key: priv key bits set: 170/320
debug2: bits set: 1026/2048
debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
debug3: Wrote 272 bytes for a total of 1277
debug3: check_host_in_hostfile: host 192.168.146.104 filename /root/.ssh/known_hosts
debug3: check_host_in_hostfile: host 192.168.146.104 filename /root/.ssh/known_hosts
debug3: check_host_in_hostfile: match line 6
debug1: Host '192.168.146.104' is known and matches the RSA host key.
debug1: Found key in /root/.ssh/known_hosts:6
debug2: bits set: 1032/2048
debug1: ssh_rsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: Wrote 16 bytes for a total of 1293
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug3: Wrote 52 bytes for a total of 1345
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /root/.ssh/identity ((nil))
debug2: key: /root/.ssh/id_rsa ((nil))
debug2: key: /root/.ssh/id_dsa ((nil))
debug2: key: /root/.ssh/id_ecdsa ((nil))
debug3: Wrote 68 bytes for a total of 1413	<<==这一步完后大概等了10s		


debug1: Authentications that can continue: publickey,password
debug3: start over, passed a different list publickey,password
debug3: preferred gssapi-keyex,gssapi-with-mic,publickey,keyboard-interactive,password
debug3: authmethod_lookup publickey
...

所以通过上面的ssh详细日志可以得出

  • tcp连接是成功建立的
  • ssh的握手连接是持续进行send和received的
  • SSH2_MSG_SERVICE_REQUEST在发送一个68bytes的包时等待了10s

所以在建立链接成功以后,等待了服务端返回某种响应,看起来不像是网络问题,而更可能是服务端的sshd服务的问题。

查看服务端的进程

[root@testyum .ssh]# ps -ef --sort start_time | grep [s]shd
root      45251      1  0 08:48 ?        00:00:00 sshd: root@pts/1
root      46417      1  0 08:57 ?        00:00:00 sshd: root@pts/2
root      46596      1  0 08:58 ?        00:00:00 /usr/sbin/sshd -D
root      48669  46596  0 09:27 ?        00:00:00 sshd: root@pts/3

这里服务端sshd进程是正常运行的,监听端口是22,而当客户端发起ssh连接的过程中,新增了两个进程accepted和net

[root@testyum .ssh]# ps -ef --sort start_time | grep [s]shd
root      45251      1  0 08:48 ?        00:00:00 sshd: root@pts/1
root      46417      1  0 08:57 ?        00:00:00 sshd: root@pts/2
root      46596      1  0 08:58 ?        00:00:00 /usr/sbin/sshd -D
root      48994  46596  0 09:30 ?        00:00:00 sshd: [accepted]
sshd      48995  48994  0 09:30 ?        00:00:00 sshd: [net]

到这里可以判断服务端也确实建立了连接,置于为什么需要10s,可以用strace命令来进行进一步的分析。

  • -r: 打印每个系统调用开始的时间戳
  • -T: 打印每个系统调用花费的时间
  • -f: 跟踪和记录子进程
  • -p: 关联pid并跟踪
strace -o /tmp/1.trc -r -T -f -p 46596

将得到的结果根据消耗时间进行排序,找出消耗时间最长的系统调用

[root@testyum ~]# sort -nrk 2 /tmp/1.trc | head
46596      5.137961 accept(3, {sa_family=AF_INET, sin_port=htons(51889), sin_addr=inet_addr("192.168.146.43")}, [16]) = 5 <0.000030>
50382      5.005162 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) <0.000017>
50382      5.004525 close(4)            = 0 <0.000010>
50382      0.040266 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000016>
50383      0.039898 read(3, "\v\3239\0033~\260\220\223\4\230x\356\307d\204\302@\314\237\314\272o\261\7\371\3477{\352\3251"..., 8192) = 52 <0.000020>
50418      0.011046 recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\0015\0\0\0\6\0\0\0\211\0\0\0", 16}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 16 <0.000020>
50383      0.005337 read(3, "\252g\25\232\306h\225\373\245\250\300\207\275\205\360Pm1\245\275r\204Q\324\224\5\331Nb\3556\311"..., 8192) = 644 <0.000006>
50383      0.002997 read(3, "\0\0\0\f\n\25\0\0\0\0\0\0\0\0\0\0", 8192) = 16 <0.000016>
50383      0.002341 read(3, "\0\0\1\f\5 \0\0\1\1\0\272\212\352\200\242\213`\207\27dF\2648mo\213$\275t\264\353"..., 8192) = 272 <0.000016>
50383      0.002071 write(3, "\0\0\1\24\10\37\0\0\1\1\0\3600\305\23\325\306iO\260\2259\354\371\330)\6\10\251b\200\355"..., 280) = 280 <0.000053>

poll([{fd=4, events=POLLOUT}], 1, 0) = 1这个系统调用等待了5s,这里可以知道它是在读取文件4fd=4

转过头去从原始trace里去找到descriptor #4是指向的什么,这里通过关键字socket,connect,= 4 <等去搜索

50382      0.000081 socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000017>
50382      0.000038 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.0.1")}, 16) = 0 <0.000032>

这个socket()已经创建了一个internet socket(AF_INET),根据SOCK_DGRAM判断是一个UDP socket,下面的conect ()根据建立的socket去访问192.168.0.1的53端口,因为udp的无状态性质,所以并不会去检查目标主机是否存在和是否响应,所以connect()调用很快就完成了。

我去ping 192.168.0.1这个地址发现无法ping通,通过搜索知道53端口是DNS服务

[root@testyum ~]# grep " 53/" /etc/services
domain          53/tcp                          # name-domain server
domain          53/udp

当发起ssh连接时,sshd服务会去访问DNS服务器,通过nslookup这个ip

[root@testyum ~]# nslookup 192.168.0.1
;; connection timed out; trying next origin
;; connection timed out; no servers could be reached

超过10s后,直接报错了,检查服务端的dns配置

[root@testyum ~]# cat /etc/resolv.conf 
search localdomain
nameserver 192.168.0.1

这个ip设置的有问题,修改成正确的以后重试

[root@localhost ~]# time ssh 192.168.146.104
real	0m0.397s
user	0m0.017s
sys	0m0.014s

另外也可以直接修改sshd_config的配置,注释掉UseDNS yes