usr/src/docs/admin


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252

# Overlay Networks Administration Guide

To better understand the implementation of overlay networks from a
non-data path point of view, it's useful to put together a straw man
proposal for how these work from an administration point of view.

This goes through a bunch of options and leaves some open questions. Any
thoughts on it should be directed to Robert Mustacchi <rm@joyent.com> or
rmustacc on irc.freenode.net.

## Overlay Devices Background

It's worth going into a bit of background as to what overlay devices
look like. An overlay device is a combination of an etherstub and an IP
tunnel device with potentially multiple destinations. An overlay device
has two different components: an encapsulation module and a search
module.

The encapsulation module defines how packets get transmitted on the wire
and aspects of their transport. Consider the two rather popular
encapsulation methods: VXLAN and NVGRE. The VXLAN specification defines
the VXLAN header that gets put on an Ethernet frame and then defines
that it be sent over UDP. NVGRE on the other hand defines that it should
use a GRE header with a specific GRE protocol field, which is itself is
a separate IP type.

In addition, today, because they define the mechanism for encapsulation
and decapsulation, they also define how they should receive packets. In
other words, what IP address and in the case of VXLAN, port, they
listen on.


The search plugin, on the other hand, defines how we route encapsulated
packets to a destination. The most simplest form of this is a point to
point tunnel where the destination plugin sends everything to a single
unicast IP address. Only slightly more complicated would be something
that sent it to a single multicast address. However, these plugins also
allow for custom means of looking up destinations and dynamic
destinations. In other words, sending an Ethernet frame to a different
destination based on its mac address or other criteria. This may be
implemented by having a list of mappings in files, or there may be some
much more advanced mechanism that integrates with a broader
orchestration suite or other database. We need to think about both of
these different classes of search plugins - static and dynamic as we go
through this.

On top off these devices, a series of VNICs can be created, like an
etherstub. All their traffic will be subject to a local virtual switch
and then sent over the encapsulated network.

## Basic model

We want to introduce a new object into dladm that we call an overlay. An
overlay is its own class, like bridges, IP tunnels, and etherstubs.
There would generally be four basic commands, much like other parts of
dladm:

  o dladm create-overlay	-- Creates an overlay device
  o dladm show-overlay		-- Shows information about overlay devices
  o dladm modify-overlay	-- Modifies aspects of overlay devices
  o dladm delete-overlay	-- Remove an overlay device


### Example: Creating a point to point VXLAN overlay network

Let's consider the act of creating and manipulating a VXLAN overlay
network. There are a handful of properties that we care about:

  o The virtual network id, the unique VXLAN identifier
  o The UDP IP address and port that we're listening on
  o The UDP IP address and port that we're going to send to

In the VXLAN case, we have a default port that we'd like to use that's
been assigned by IANA, but obviously, allow it to be overridden. In
addition, we've listed two implicit properties in this example that we
want to first class:

  o The encapsulation plugin, VXLAN
  o The search plugin, a direct point to point tunnel we'll call 'direct'

Here's one idea of how this could look:

```
dladm create-overlay -e vxlan -s direct -v 169 -p vxlan/listen_ip=a.b.c.d -p direct/dest_ip=e.f.g.h overlay0
```

If we take this apart, -e specifies the encapsulation module that we
should use for this device. While -s is specifying that we should use
the 'direct' search plugin which is a point to point tunnel. While we
have default values for the vxlan/listen_port and we can use the same
default for the search plugin, we don't have the same for an IP address.
Because of this, we have to specify it on the command line.

In this world, -e, -s, and -v are sugar for properties. Respectively the
properties 'encap', 'search', and 'vnetid' That means this could look
like:

```
dladm create-overlay -p encap=vxlan -p search=direct -p vnetid=169 -p vxlan/listen_ip=a.b.c.d -p direct/dest_ip=e.f.g.h overlay0
```

In addition, much like with other dladm objects, a user would also be
able to specify other properties on the creation command like, such as
the vxlan listen port (vxlan/listen_port) or something like the MTU.

I believe that having a summarized version with some options that are
reflected with abbreviations like in the first form makes sense, but we
don't want to elevate too much into short letters, otherwise every
plugin, while overlap exists, will cause us to go through and run out of
letters.

## Propety namespaces

In the above, you'll notice that we've namespaced various properties
such as the vxlan/listen_ip. There is nothing special about these
properties. This is just a way of grouping things in a way that might be
a bit more useful for folks. This makes it easy to see which properties
are related to the fact that we're doing vxlan encapsulation or some
other plugin-based aspect. The unprefixed properties should all be
generic.


## Default and required properties

Each plugin is going to have to define their own set of required
properties and whether they have default values. For example, vxlan
defines two properties:

  o  vxlan/listen_ip
  o  vxlan/listen_port

The listen_port has a default value of 4789. While this is what most
folks will want to use, some will want to override it at creation time.
However, just like we can modify the defaults with ipadm, we'll want to
do the same here and have some mode that allows us to modify them. This
will allow an administrator to override a default property for their
site.

In addition to those default, some plugins have properties that require
an argument, but have no default. There's no good default for an IP
address to listen on. INADDR_ANY or loop back (and their v6 equivalents)
are almost certainly wrong choices. As such, the plugins will need to
define their required properties. This should probably be displayed in
some way along with the defaults. Perhaps this is done as a new flag on
modify-overlay and show-overlay to see the default/required sets?

```
dladm show-overlay -d
dladm modify-overlay -d
```


## Working through other examples

To help evaluate other behavior and choices, it's worth working through
some other examples and seeing what kinds of questions this causes us to
have.

### NVGRE and Geneve

NVGRE and Geneve are other encapsulation modules. NVGRE is already
widely deployed generally in MSFT related software. Today, it fits the
model fairly well. NVGRE just has a single property:

  o nvgre/listen_ip

Geneve on the other hand is yet another draft that is supposed to bring
the titans of NVGRE and VXLAN together. It's another UDP like protocol.
It has properties like VXLAN:

  o geneve/listen_ip
  o geneve/listen_port


Now, you might be saying, but wait rm, we're always redefining
listen_ip! Can't we just make that a first class property. Oh and maybe
listen_port, but wait that NVGRE isn't using it. Hmm.

So, I have thought about this, but I think for the time, I'd rather make
them specific to the plugin. There may be something new that comes
around and actually doesn't use IP at all and in fact speaks on an
ethertype. At which point we'll want to use dls/dlpi/vnd to manage that
communication and there will not be any IP. Of course, this doesn't
exist today, but I think it doesn't make sense to elevate it at first.
If it turns out that this is all terrible, then we can go figure that
out.

### Multicast

Another common thing that we're going to build out of the gate is a
multicast module. Unlike the direct model, with the multicast model, the
user subscribes to a multicast address to receive packets and sends
packets out to a multicast group. In this case, the normal listen_ip
would still exist, it would just be a multicast IP. However instead of a
multicast/direct_ip, we would have:

  o multicast/dest_ip

### Search plugins and different encap types

One of the things that we'd like to do is have an encapsulation
algorithm encode what it requires to be directed, eg. VXLAN requires and
IP address and a port. NVGRE just requires an IP. We'll have t work out
an interface, but the search plugin will be able to get some information
from the encapsulation plugin which will let it know if it should
require an IP address, an IP address and a port, or some other set of
configuration. It will also provide a way to share defaults.
Realistically, the search plugin, should probably just inherit the
default from the encap plugin, if appropriate. Otherwise an
administrator should define a global default for all encapsulation
plugins for the search module. While the former is useful, the latter
isn't going to be very realistic.

### Files plugin and dynamic destinations

The files plugin is interesting, because it moves away from the world of
static maps to dynamic destinations. In this case, the destination of a
packet with a mac address m, is defined by some file on the system,
similar to /etc/hosts (but an entirely different format).

The interesting thing about the files model is that it requires some
additional features, which we need to implement anyways. For example
we'll need to:

  o Be able to flush the destination map
  o Obtain some way of getting the destination map from the kernel
  o For the files case, tell it to refresh the mappings from the file
  o Update the mapping file to something else

While the last of these would be done through a normal modify-overlay of
a property, the question is how would we handle all of these other
actions. They basically have us performing actions on the device that,
in some cases, are specific to the plugin-type.

How should this be done? Should these be kind of transitive properties
that exist? Like a 'files/refresh' property that you set to true? I'm
not sure I really like the idea of transitive properties as much as I
perhaps want to have some alternative option space to modify-overlay
perhaps that allows you to inject one of these options.

Perhaps all of these three things instead should be part of some
command that isn't dladm? I dunno, I'm not really partial to that.

## Concluding Thoughts

All in all, I think we have a reasonable framework to start with, but
some of these details kind of matter and as we go further down the path
of implementation, the tradeoffs and options that we want to make will
make more sense. It may very well be that there will not be many
properties except for the Joyent or other user specific search
mechanisms and therefore spending too much time on this isn't
worthwhile.