Software Defined Networking (SDN) and Network Functions Virtualisation (NFV) are the future – and if you aren't already learning about them you're probably already doomed. If that strikes you as a little pessimistic then there is a bright side: most of us are already doing some of it and we all understand more about it than we think.
SDN is the ability to rapidly detect and adapt to changes in network infrastructure. This can be, say, the addition of devices or changes in topology.
More ReadingSDN: It's living the dream – and just using what you've gotDo you speak NFV? Time to go back to school and learnNow listen, Gartner – virtualisation and containers ARE differentI cannae dae it, cap'n! Why I had to quit the madness of frontline ITHow to turn application spaghetti into tasty IT services
NFV is the ability to stand up, tear down, automate and orchestrate network elements in some easy-to-use manner. Network elements can include switches, routers, firewalls, Intrusion Detection Systems (IDS), monitoring, port mirroring and even entire clusters of virtual or physical server instances.
NFV is frequently lumped in with SDN, as both technologies are highly complimentary. It is possible to do NFV without SDN (see: Webmin's virtual twin here). It is also entirely possible to implement SDN without layering NFV over the top.
None of this is new. The "wow" factor to SDN is that instead of having to log into each switch or router one at a time (via scripts, GUI or command line), your entire network is orchestrated by some centralised management server.
Let's consider a few practical examples.
SDN networking gear
The most colour-by-numbers type of hype-compliant SDN today involves switches that "separate the control plane from the data plane". Translated into human, this means people are finally making centralised management for switches so we don't have to log in via telnet or SSH to every switch on our network.
In practice, SDN means buying inexpensive switches where the hardware manufacturers don't make a lot of margin and installing expensive software on it. This is quite a change from the old practice of expensive hardware and terrible (or no) software.
The expensive software portion of the SDN equation allows switch configurations to be monitored in real time. When an event occurs (maybe a dead port, cable out, or switch down) the centralised control server or servers detect the issue and automatically change relevant network configurations to keep as many of the network services running as possible.
Consider for a moment a simple network with four switches. Each switch has a connection to two other switches. Two of the switches connect to the router that goes out to the internet. Cut any one connection between the switches and they would still be able to see other switches.
Basic 4 switch setup with a failed link between switches 1 and 3
Sadly, it's never that simple.
Let's say that the cable between Switch 1 and Switch 3 occupies port 4 on both switches. Every time Switch 1 is asked to find devices that are attached to Switch 3, it will fire those packets out of port 4, because that's what Switch 1's map of the network looks like.
If I cut the wire between Switch 1 and Switch 3, several things need to happen for Switch 1 to continue being able to send packets to devices located on Switch 3. The first: Switch 1 needs to know that the cable has been cut. This one's easy; even the dumbest of dumb switches knows when the cable's out.
Knowing that the cable is out, Switch 1 should now be able to understand that all those addresses it thought were available via port 4 now suddenly can't be reached there. This is where things get complicated.
Looking at the network map, we can all clearly see that to get from Switch 1 to Switch 3, packets need to be sent to Switch 2. Switch 1 is connected to Switch 2, which is in turn connected to Switch 3. For a switch, this isn't so easy to understand.
Après moi, le déluge
Normally, the way a switch finds out where devices on the network are located is by "flooding". This is a packet sent to every port on every switch on the network. (Network nerds: put VLANs to one side for a later discussion, please).
If the device addressed in the flood is connected to the network it will respond, and the switch (and all intermediate switches) will know how to send packets to get them from A to B.
The problem is that if packets are supposed to be forwarded throughout the entire network, you can't have two cables connecting switches. If you do this, you get a loop. Looking at the fully connected network map, if Switch 1 were to emit a packet towards Switch 3, that packet will then be forwarded from Switch 3 to Switch 2 and then back on to Switch 1. This would get sent to Switch 3 again, which would get forwarded to Switch 2 – and we have our loop.
To counter this, technologies such as Spanning Tree Protocol (STP) were developed to detect loops and shut off redundant links. In theory, good STP implementations would detect when a link that causes a loop is down and bring up the redundant link so that packets will keep flowing. In practice, STP never quite seems to work as advertised.
And here we're at the heart of it. STP has been around for more than 15 years and it has been broken since before it was standardised. Networking engineers have been talking about alternatives to STP since before STP was formalised as a standard.
Alternatives to STP include Shortest Path Bridging (SPB) and Transparent Interconnect of Lots of Links (TRILL). There are also proprietary approaches and approaches involving making everything Layer 3 and cranking up the dollar value of all the bits involved. There are no clear winners yet, and with SDN we're already on to the next chapter before the saga is even finished.
For additional fun, after you sort out the physical connectivity between switch issues, you need switches to be able to exchange information about VLANs. Here we could go down rabbit holes like GARP, MRP, VTP, etc. and start holy wars in the comments.
A human has no trouble looking at the network map and saying "when the link between Switch 1 and Switch 3 is down, then to get to Switch 3 go via Switch 2". The switches, on the other hand, can get all manner of confused – and this diagram only has four switches.
Kicking complexity up a notch
To make modern networking more frightening, in addition to switches having some difficulty keeping who's who and what's where straight, those irritating server admins have virtualisation technology. Now they can move workloads from one server to another at will.
A VM that was on port 15 on Switch 3 can suddenly appear on port 2 on Switch 4. Not only is the server team not going to file a change request with networking before moving VMs around, there are automated systems built right into the hypervisor that move VMs around based on server load. There is no chance whatsoever that network admins can manually configure the interconnects between switches in a modern environment.
Some of the successor protocols to STP can be used to do this in an automated fashion. Unfortunately, they aren't all evenly supported across switch vendors. Worse, the various major vendors have picked their sides and spread pointless FUD about "the enemy's protocols". And you still then have to configure each switch to play nice with the protocol and so on and so forth*.
Cisco, Juniper, Hewlett Packard, Dell, Arista and any of the other big switch manufacturers can solve this problem for you. If you would kindly pay them the sum of exactly enough to make you say "uncle", they can make all of this go away.
Oh, you'll still have to do an awful lot of configuring, but they have handy certification courses and management tools that make it just easy enough that companies which already use their solutions won't look elsewhere.
SDN takes a different approach. In an SDN world, you register a switch with the control plane in a manner not fundamentally different from connecting an ESXi host to vSphere, or back-up agent to the back-up server. Once registered, the switch is instructed to discover all of the devices connected to it and report back. The control servers then calculate the new topology of the network and update every device with the new information.
If a link drops, this is detected and reported to the central servers. The central servers calculate the new topology and distribute it to all devices. Did you add a cable between two switches or move a VM? Recalculate and distribute.
Forget complicated protocols and configuration. Building the "index" of who's who and what's where is done centrally and pushed out to all devices with each change. Detecting, calculating and distributing this information is called reconvergence. Even though SDN does it in a different way to older protocols, the goal and the result are (mostly) the same.
NFV on top of SDN
Consider for a moment that in a "cloud" environment, nobody cares about the switches. Nobody wants to know if the link between a switch is down and there is zero tolerance for any downtime required to fix it.
None of this matters to the end users, to the server admins, the storage admins or anyone else who has to actually use the network. It's all impediment and distraction. In fact, in addition to not caring about how packets get from A to B, they want to push all sorts of things off on to "the network" for automation.
Firewalls, intrusion detection and so much more were often part of network oversight. Traditionally these functions were performed by routers, Layer 3 switches or network appliances.
Increasingly these functions are simply spun up on demand as VMs. Getting this right requires complicated network flows operating in an automated fashion that no human-mediated change-control system could ever hope to cope with.
Spawning a "virtual service" in your corporate private cloud will not only light up a series of VMs handing an application, database, storage and analytics. It will also fire up network security VMs, trigger the creation of access control lists, add things to back-up and disaster recovery regimens and more.
All this will need to be added to monitoring packages and all of it has to happen at the push of a button.
The point of SDN is not merely to replace some protocols for network reconvergence with yet another way of doing the same thing. That's a critical part, but it is only the most basic building block. SDN's entire purpose is to plug into advanced infrastructure management stacks to combine the low-level networking stuff with the high-level stuff, so that all of what has been discussed here is seamless.
SDN blurs into NFV. NFV at scale needs SDN, because of the sheer number and speed of changes. SDN at scale need NFV because simply automating reconvergence isn't enough. All those network functions that used to be in hardware need to be put into VMs and automated away in software.
Once upon a time, the internet was mostly computers with humans using them, talking to other computers with other humans using them. Today, the internet is mostly computers talking to other computers without humans involved. Those computers are the new user, and they need services to be created and destroyed faster than we puny fleshies can react.
Fortunately, most of us are doing at least some of the above already. Now we just need to get better at automating it. Welcome to SDN, NFV and the future. ®
*Chris Whal points out the following: "this shouldn't impact STP as the edge ports connected to hypervisors should have STP disabled. And the hypervisor will use RARPs to flood source MACs upstream for quick address learning if a link fails. And in NSX, for example, ARPs and other BUM traffic are absorbed by the VTEP and responded to in-line."
He has a point; properly configured, your hypervisor-facing points should have STP disabled, and NSX does handle flooding back to the switches rather nicely. Of course, NSX is proper SDN and thus is handling some of the problems of today's networks on its own.
NSX is perfectly capable of handling flooding issues even on switches that aren't configured correctly, or which have a binary "enabled on all ports/disabled on all ports" option for STP. This reinforces the point: the days of the edge device trusting the network entirely are over. Edge devices like hypervisors are full participants in the network.