ACI & Kubernetes – The Cisco K8s CNI (Part Two)
In this, part two, (Part One Here) I want to talk about what I see as the pros and cons of using the Cisco K8 CNI.
The Cisco K8 CNI is providing a few different things;
- Core K8 Networking Plugin
- Open vSwitch (replacing kube-proxy & iptables)
- K8 Native LoadBalancer (Ingress Controller)
- REST API client for APIC configuration
- VXLAN overlay connected to ACI infra overlay-1 network thereby acting as a VXLAN TEP
OVS vs. Kube Proxy
The use of OVS to replace the kube-proxy and iptables elements certainly is a benefit where we use the LoadBalancer for a deployment, this means the CNI is managing the load balancing to the pods and not kube-proxy. Where we have CNI’s that rely on kube-proxy, we always have that final kube-proxy LB no matter where we send the traffic in the deployment.
Having a native ingress controller is good and implements the LoadBalancer well, but I believe this is where the implementation should stop. After creating the LB in K8s and assigning an EXTERNAL-IP, the CNI then configures the APIC with a L3Out EEPG, Contract, PBR Service Graph (LB to all deployment pods). What frustrates me out this is mainly that the contract has a scope of VRF and along with the other components of this configuration they are immutable by the fact that if the configuration is changed, the CNI reverts it. So I can’t change the scope of the contract to be Tenant or Global restricting the use of the implementation. This is further frustrated by the same issue if I try to change the EEPG subnet (LB VIP) scope by trying to make it exportable, the same issue occurs when making this change, its reverted. This means I cannot create a new contract with Tenant or Global scope referencing the existing service graph and applying it to the EEPG as a provider and consuming from EPGs outside of the EEPG VRF.
This is frustrating, its almost like you start down a road of trying to hack something to get it to work, but this is not a good path to go down, keeping things consistent is your deployments is key if they are to be scalable and maintainable.
I would like to see this immutable ‘feature’ removed so these things can be changed or a flag (annotation) in the deployment configuration file to turn on/off the configuration of ACI – just create the LB in the CNI, assign the LB VIP address and stop there!, I will sort the rest out, I can grab the LB VIP via an API from K8s if I have used an API to create the deployment, then configure my own solution within ACI to route traffic to the VIP.
In terms of the redirect policy and the VLAN 3950 host addresses, I could supply the deployment file with a tenant name and a redirect policy name with which the CNI could configure & manage, I would then leverage this redirect policy myself in my SG PBR configurations. This would be my preferred option.
I just find this too restrictive, almost like its trying to do too much and its over-engineered. Fix this and I am happy with this CNI.
The Cisco CNI supports multicast as we have seen with the mcast application. Some CNI’s including the default kube-net and Calico (at least not yet 2021) CNI’s do not support multicast.
Deployment Pod EPG
There is of course the EPG created for the deployment specified in the deployment configuration file as an annotation. This EPG’s endpoints are the deployment Pod IP addresses, so we can apply contracts to this EPG and permit access directly to Pod IPs or add a PBR SG or an inline NLB (i.e. F5) if we want to provide load balancing.
The F5 solution works well with the F5 App Center (ACI Application) as it will detect (APIC REST Subscription to fvCEp changes in an EPG) and IPs added or removed from the EPG and add/remove them appropriately from the LTM server pool therefore making things automated when the deployment is scaled. You could do this yourself with a bit of code with a API subscription and using a SG redirect policy and updating it accordingly, little cheaper then F5 licences. 🙂
The VMM domain provides some basic information around the nodes, pods and deployments which is nice but not as fully featured as a K8 dashboard. The VMM Domain is also used in the EPGs connected to the cluster of course to identify which EPG’s endpoint belong to, reported via the CNI to the APIC.
Documentation is nearly non-existent and what is available at first glance does not cover the concepts of what its trying to do and what/why some configuration options exist. In terms of updates, I don’t see release notes for the CNI or provisioning tool anywhere
I like the solution, but in all honesty get rid of the automated L3Out immutable disaster, provide an option not to deploy it or options around only managing a SG PBR policy. This makes things more flexible and enables me to keep deployments more consistent.