docker mtu介绍

本文所述内容是针对旧版本docker(doker 1.10版本以下不包括1.10版本)

docker mtu

By default, the Docker server creates and configures the host system’s docker0 interface as an Ethernet bridge inside the Linux kernel that can pass packets back and forth between other physical or virtual network interfaces so that they behave as a single Ethernet network.

==启动docker时==,docker服务器会在内核中创建一个docker0网卡接口作为网桥,通过此网桥,虚拟网卡和物理网卡之间可以互相传输数据包。

Docker configures docker0 with an IP address, netmask and IP allocation range. The host machine can both receive and send packets to containers connected to the bridge, and gives it an MTU – the maximum transmission unit or largest packet length that the interface will allow – of either 1,500 bytes or else a more specific value copied from the Docker host’s interface that supports its default route. These options are configurable at server startup:

具体的,docker会为docker0配置一个IP地址、子网掩码和ip地址池,物理机可以与连接到docker0的任何容器传输数据。同时设置docker0的mtu值为1500或者宿主机默认网卡的mtu值(==docker1.10以下版本==)。这些工作在docker启动时进行。

Once you have one or more containers up and running, you can confirm that Docker has properly connected them to the docker0 bridge by running the brctl command on the host machine and looking at the interfaces column of the output. Here is a host with two different containers connected:

一旦启动一个容器,docker便会将这个容器连接到docker0上。可通过sudo brctl show命令查看

1
2
3
4
$ sudo brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.3a1d7362b4ee no veth65f9
vethdda6

Finally, the docker0 Ethernet bridge settings are used every time you create a new container. Docker selects a free IP address from the range available on the bridge each time you docker run a new container, and configures the container’s eth0 interface with that IP address and the bridge’s netmask. The Docker host’s own IP address on the bridge is used as the default gateway by which each container reaches the rest of the Internet.

最后,每当新建一个容器时,docker会使用docker0的配置新建容器的eth0网卡。包括从ip地址池选择一个ip地址、配置eth0,物理机的ip地址将作为网关,供容器访问网络。
参考网址

总结:docker会使用当前host主机的默认网卡的mtu值作为容器的mtu值(其次使用1500),在==运行一个新的容器或者重启容器时==docker server会自动修改docker0的mtu值为物理机的mtu值。具体看下文测试

docker1.10之后的版本改掉了这个设置,原因是:

  • 本机网卡mtu可能被随时更改,
  • 本机可能有多个默认route,
  • docker可能不走默认route,
  • kernel的path mtu discovery机制可以解决这个问题。

==docker1.10版本后采用的方案是仅使用默认值1500,不再根据主机默认网卡mtu设定dockermtu,由内核的PMTUD机制解决mtu值问题==
参考网址

最新版本docker mtu说明如下:

Docker configures docker0 with an IP address, netmask and IP allocation range. The host machine can both receive and send packets to containers connected to the bridge, and gives it an MTU – the maximum transmission unit or largest packet length that the interface will allow – ==of 1,500bytes==. These options are configurable at server startup:

详见官方文档

docker mtu值测试

环境:centos7 docker 1.8.2

当前mtu值 主机eth0:1500,docker0:1500:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@103-16-30-sh-100-i05 ~]# ifconfig
docker0: flags=4099 mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:de:68:5d:70 txqueuelen 0 (Ethernet)
RX packets 14195 bytes 4644224 (4.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19776 bytes 24864803 (23.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163 mtu 1500
inet 10.103.16.30 netmask 255.255.248.0 broadcast 10.103.23.255
ether 90:b1:1c:41:55:55 txqueuelen 1000 (Ethernet)
RX packets 57092433 bytes 5068956141 (4.7 GiB)
RX errors 0 dropped 1295 overruns 0 frame 0
TX packets 22613383 bytes 4554035489 (4.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 16

更改主机eth0的mtu值为1460:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@103-16-30-sh-100-i05 ~]# ifconfig
docker0: flags=4099 mtu 1500
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:de:68:5d:70 txqueuelen 0 (Ethernet)
RX packets 14195 bytes 4644224 (4.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19776 bytes 24864803 (23.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163 mtu 1460
inet 10.103.16.30 netmask 255.255.248.0 broadcast 10.103.23.255
ether 90:b1:1c:41:55:55 txqueuelen 1000 (Ethernet)
RX packets 57097432 bytes 5069384517 (4.7 GiB)
RX errors 0 dropped 1295 overruns 0 frame 0
TX packets 22614564 bytes 4554268290 (4.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 16

运行一个新的容器:

1
2
3
4
5
6
7
8
[root@103-16-30-sh-100-i05 ~]# docker run busybox ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:01
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1460 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

再次查看mtu值已经改变:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@103-16-30-sh-100-i05 ~]# ifconfig
docker0: flags=4099 mtu 1460
inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:de:68:5d:70 txqueuelen 0 (Ethernet)
RX packets 14195 bytes 4644224 (4.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 19776 bytes 24864803 (23.7 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163 mtu 1460
inet 10.103.16.30 netmask 255.255.248.0 broadcast 10.103.23.255
ether 90:b1:1c:41:55:55 txqueuelen 1000 (Ethernet)
RX packets 57099443 bytes 5069560510 (4.7 GiB)
RX errors 0 dropped 1295 overruns 0 frame 0
TX packets 22615341 bytes 4554430631 (4.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 16

==docker1.10之后的版本默认使用1500作为mtu值,不再使用主机默认网卡的mtu值==。实验步骤相同,实验结果不再贴出。

主机mtu与docker mtu不匹配带来的问题

当docker的mtu值与主机网卡的mtu值不匹配时可能会带来严重的问题。例如:
docker0 mtu=1500
host eth0=1400
可能出现的问题:在docker中发出的包经过主机eth0时都会被丢弃,例如apt-get update会卡住,提示waiting for headers详情见:https://github.com/docker/docker/issues/22028以及https://github.com/docker/docker/issues/12565

出现这个问题的主要原因:
PMTUD未正确工作,原因可能是ICMP黑洞问题(ICMP-FRAGMENTATION-NEEDED包丢失)

解决办法:

  1. 最简单的办法是将docker的mtu与主机mtu设置一致。通过docker daemon的”–mtu”参数设置docker的mtu值。若是docker1.10以下版本,通过重启docker和容器即可以将docker的mtu值设置为主机默认的mtu值

  2. 另外一种办法是确保PMTUD正确执行

  • 检查PMTU参数配置是否正确
    /proc/sys/net/ipv4/ip_no_pmtu_disc=0(0表示启用PMTUD)
  • 增加一条路由表设置
    iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu(通过PMTUD机制设定MSS值)
  • 确保ICMP包未丢失