Skip to content

Instantly share code, notes, and snippets.

@bjwschaap
Last active May 11, 2016 22:09
Show Gist options
  • Save bjwschaap/4f20f5a6363a55cb4e352c1ae4780fe7 to your computer and use it in GitHub Desktop.
Save bjwschaap/4f20f5a6363a55cb4e352c1ae4780fe7 to your computer and use it in GitHub Desktop.
A gist that shows a simple walk-through of inspecting how a Docker overlay network works on the network level

Prerequisites of this lab

First we create 3 ubuntu 14.04 hosts with docker 1.11.1 on them.

We need a KV store to be able to enable multihost/overlay networking in Docker

docker run -d -p 8500:8500 -h consul --name consul progrium/consul -server –bootstrap

Now we need to let our Docker engines know where to find Consul

DOCKER_OPTS="-H tcp://0.0.0.0:2375
-H unix:///var/run/docker.sock
--cluster-store=consul://$MASTER_IP:8500/network \ --cluster-advertise=eth0:2375"

Restart docker-engines to enable cluster-store

sudo service docker restart

Create an overlay network on 1 of the hosts

docker network create -d overlay --subnet 10.10.10.0/24 multinet

Check if it was created

docker network ls

NETWORK ID          NAME                DRIVER
4a91d51c8352        bridge              bridge
9396e02c30e4        docker_gwbridge     bridge
a188f529878c        host                host
ec752ef8859b        multinet            overlay
3a30f03b6183        none                null

docker network inspect multinet

[
    {
        "Name": "multinet",
        "Id": "ec752ef8859b9a7db88305a6065cc1d85ce04679f5492bebba97171928afcfb4",
        "Scope": "global",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "10.10.10.0/24"
                }
            ]
        },
        "Internal": false,
        "Containers": {},
        "Options": {},
        "Labels": {}
    }
]

Start a simple container on each host, attached to the overlay network

On node-1:

docker run --net multinet --name node1test -d busybox

On node-2

docker run --net multinet --name node2test -d busybox

Notice 2 new interfaces that got added to each host

ip link

8: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
    link/ether 02:42:41:57:e3:79 brd ff:ff:ff:ff:ff:ff
31: vethdd1ad9b@if30: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP mode DEFAULT group default
    link/ether f6:a8:e9:12:49:37 brd ff:ff:ff:ff:ff:ff

It takes 2 to tango on a veth pair...

ethtool -S vethdd1ad9b

NIC statistics:
     peer_ifindex: 30

docker exec -it node1test ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
28: eth0@if29: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 02:42:0a:0a:0a:02 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.2/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe0a:a02/64 scope link
       valid_lft forever preferred_lft forever
30: eth1@if31: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.2/16 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:2/64 scope link
       valid_lft forever preferred_lft forever

so there it is.. The veth pair is between eth1 in the container and vethdd1ad9b on the host. Where is vethdd1ad9b attached to? Turns out it is on the docker_gwbridge:

brctl show

bridge name	bridge id		STP enabled	interfaces
docker0		8000.024264855414	no
docker_gwbridge		8000.02424157e379	no		vethdd1ad9b

Eth1 in the container connects to the docker_gwbridge to be able to communicate with the outside world. Which must mean eth0 is connected to our multinet overlay network, routing traffic to containers on other Docker hosts.

Also notice the MTU is set to 1450. This is 50 bytes less than we would expect (default MTU is 1500). This is because a VXLAN header is exactly 50 bytes. Setting MTU to 1450 allows for the VXLAN encapsulated packet to precisely fit into the standard size MTU of 1500.

Let's track eth0 and see where it leads to.. We know it's ifindex 28.

Docker uses network namespaces to create self-contained sets of interfaces and routing tables to provide dedicated bridges for all networks (e.g. container networking, overlay networks, etc.). So each overlay network is configured into its own network namespace.

sudo ls -al /var/run/docker/netns

total 0
drwxr-xr-x 2 root root 80 May 11 18:36 .
drwx------ 4 root root 80 May 11 11:08 ..
-r--r--r-- 1 root root  0 May 11 18:36 2-ec752ef885
-r--r--r-- 1 root root  0 May 11 18:36 c146eed489de

We need to create some symlinks to let ip netns work nicely with Docker namespaces:

mkdir -p /var/run/netns
sudo ln -s /var/run/docker/netns/2-ec752ef885 /var/run/netns/2-ec752ef885
sudo ln -s /var/run/docker/netns/c146eed489de /var/run/netns/c146eed489de

Now to check all net namespaces:

ip netns list

c146eed489de
2-ec752ef885

In our lab we have only 1 container running on the host, so 1 namespace must be for the container, while the other is for the overlay bridge. Let's check which namespace is used by the container:

docker inspect --format '{{.NetworkSettings.SandboxKey}}' node1test

/var/run/docker/netns/c146eed489de

This means the other namespace is for the overlay network. Let's see what interfaces are in the overlay network namespace:

sudo ip netns exec 2-ec752ef885 ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 92:6c:6f:50:c1:81 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.1/24 scope global br0
       valid_lft forever preferred_lft forever
    inet6 fe80::a087:15ff:fe08:f5e3/64 scope link
       valid_lft forever preferred_lft forever
27: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UNKNOWN group default
    link/ether ba:40:1f:f9:6c:96 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b840:1fff:fef9:6c96/64 scope link
       valid_lft forever preferred_lft forever
29: veth2@if28: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP group default
    link/ether 92:6c:6f:50:c1:81 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::906c:6fff:fe50:c181/64 scope link
       valid_lft forever preferred_lft forever

So here we see br0 which connects containers on the multinet overlay network that are on the same host, vxlan1 which encapsulates and routes traffic to containers on the other Docker hosts, and veth2. So what is on the other end of veth2?

sudo ip netns exec 2-ec752ef885 ethtool -S veth2

NIC statistics:
     peer_ifindex: 28

Aha! There is our ifindex 28! This means it's connected to the eth0 inside the node1test container. So eth0 from our container is connected to the br0 interface inside the overlay bridge network namespace:

sudo ip netns exec 2-ec752ef885 brctl show

bridge name	bridge id		STP enabled	interfaces
br0		8000.926c6f50c181	no		veth2
							vxlan1

We can see the VXLAN is plugged into br0 as well. So this is how our container talks to containers on the other hosts.

Now the last question remains: how do the vxlan interfaces know how/where to route traffic destined for other containers to?

As it turns out vxlan1 is a bridge interface as well. This bridge has a forwarding database (FDB), which contains a list of MAC addresses with their corresponding ip addresses. We can see this FDB through:

sudo ip netns exec 2-ec752ef885 bridge fdb show dev vxlan1

ba:40:1f:f9:6c:96 permanent
ba:40:1f:f9:6c:96 vlan 1 permanent
02:42:0a:0a:0a:03 dst 10.0.11.6 self permanent

The ip 10.0.11.6 is the ip of node-2. This is where our node2test container is, which is also connected to the multinet overlay network. So who's MAC is listed in that rule?

On node-2:

ip a

...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 02:03:bd:c6:f2:7d brd ff:ff:ff:ff:ff:ff
    inet 10.0.11.6/20 brd 10.0.15.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::3:bdff:fec6:f27d/64 scope link
       valid_lft forever preferred_lft forever
...

The MAC is not the host's. As it turns out, it's the MAC of the eth0 in the node2test container:

docker exec -it node2test ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
17: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
    link/ether 02:42:0a:0a:0a:03 brd ff:ff:ff:ff:ff:ff
    inet 10.10.10.3/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:aff:fe0a:a03/64 scope link
       valid_lft forever preferred_lft forever
19: eth1@if20: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.2/16 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:2/64 scope link
       valid_lft forever preferred_lft forever

This makes sense, since we saw that eth0 in the container is always connected to the overlay network.

So where do the rules come from? They are managed by the docker-engine, which communicates with other docker-engines through the Serf (a gossip) protocol. In order to form a cluster that exchanges events through Serf, the KV store is used (in our example Consul).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment