KubeCon 2019 insights

Attended KubeCon 2019 in San Diego last week. Great conference, liked the 30min prezo format, got more out of this one than any I’ve attended. And, late November is a great time to be in southern CA, though it’s almost disorienting to see the sun every day.

Key Takeaways

  1. Teams should be structured to minimize cognitive load in order to maximize effectiveness, “minimize cognitive load for others”, “use small, long-lived teams as the standard” – from the Team Topologies book authors
  2. DevEx (Developer Experience) term (new to me) used by the Team Topologies book authors – though this is easily confused with devx.com, I like it. “DevEx must be optimized to maximize feature flow”
  3. CD platform Spinnaker has really taken off with wide adoption and a healthy ecosystem – Spinnaker Summit, now in its third year, is now 500+ attendees over three days, just prior to KubeCon  / CloudNativeCon. Armory offers a hosted Spinnaker solution. A couple of my ex-co-workers have presented at Spinnaker Summit and are active members of the community: Joel Vasallo, Steven Basgall
  4. Serverless Frameworks can mean different things, ranging from as simple as scale-to-zero to functions-as-a-service. Examples of the latter are OpenFAAS (which supports k8s, Swarm, ECS / Fargate) and vendor-specific Lambda, Google Cloud Functions, Azure Functions; of the former, Knative (which is more of a serverless building-blocks technology which appears destined to be commonly-paired with k8s)
  5. Platform as a Product leads to a reliable platform
  6. Observability is largely about being smarter re unanticipated failure events (the “weird zone”) and refining craft in using the monitoring / alerting tooling one already has
  7. Security requires ongoing effort, “treat vulnerabilities like earthquakes” (they’re going to happen, maximize resiliency / recovery)
  8. Kubernetes (k8s) ecosystem is strong and getting stronger, many using it in production, GKE currently more widely used than EKS, but EKS is certainly working for some. “Kubernetes is the OS for the cloud”
  9. Service Mesh delivers the most benefit when most or all applications are on a service-mesh-ready deployment technology (incl. k8s, AWS ECS / Fargate). There are competing service mesh solutions (incl. Istio, Linkerd, AWS-specific-AppMesh, a la carte Envoy, and others), with perhaps no clear leader yet (though Knative may drive Istio adoption). “Service mesh does for service-to-service communication what Kubernetes has done for orchestration”

Favorite talks / workshops

The Elephant in the Kubernetes Room: Team Interactions at Scale – Manuel Pais, Independent (co-author of “Team Topologies”)  ** slides  **  Team Topologies book **  teamtopologies.com  **  How Airbnb Simplified the Kubernetes Workflow for 1000+ Engineers

I’m a third  of the way into the book, best I’ve read since Lean Enterprise. Really well-written, just the right balance of theory / case-study, and the diagrams are well chosen.

  • DevEx == Dev Experience – simplify, simplify, simplify
  • Platform as a Product – leads to a Reliable Platform
  • Cognitive Load – minimize it within any team, and optimize DevEx by minimizing it for Dev team customers (by providing easy-to-use abstractions)
  • Platform Team: Platforms fit for purpose, optimized for DevEx
  • Primary 3 team types: Stream-aligned (feature delivery), Enabling (DevOps embedding / support for stream-aligned), Platform
  • Platform should make it easier to do the right thing, encouraging dev teams to use the platform and not diverge and be on their own
  • Kubernetes should be a hidden impl. detail with the Platform providing abstractions for good DevEx

OpenFaaS Cloud + Linkerd: A Secure, Multi-Tenant Serverless Platform – Charles Pretzer, Buoyant & Alex Ellis, OpenFaaS, LTD  ** slides

  • OpenFAAS is “Serverless 2.0” (Any code, Anywhere), vs. “Serverless 1.0” vendor-specific platforms incl. AWS Lambda, Google Cloud Functions, Azure Functions
  • Anywhere means k8s, Swarm, ECS / EKS, Datacenter, Local
  • Two options: OpenFAAS, OpenFAAS Cloud which bundles Git-based CI / CD, Runtime secrets, OAuth, Linkerd-based Service Mesh
  • Simplicity (short stack.yml + handler.js for a simple node.js example, vs. 6 distinct longer configs for k8s). Note OpenFAAS Dockerfile is optional but is supported
  • Linkerd (bundled with “OpenFAAS Cloud”): Only service mesh currently in CNCF (incubating): Actionable metrics, Deep runtime diagnostics, CLI-debugging, 60sec install, lightweight. Traffic-splitting, mTLS, Dashboard showing routing paths.

No-Nonsense Observability Improvement – Cory Watson, SignalFx ** slides

  • “The Normal Zone” includes monitoring for Anticipated behaviors
  • The “Weird Zone” is about Observability of Unanticipated behaviors
  • Observability will be one of your most expensive projects
  • Incident Measures++ traditional ones incl. MTTD, MTTR can be lame – instead look to Nora Jones Cyclic Approach: Difficulties in Understanding, System-specific failure rates, Surprises, Lack of ownership, Near misses
  • Automation – need to avoid human being “out of the loop”
  • Invest in risk and need
  • Understand the use cases

Making an Internal Kubernetes Offering Generally Available – James Wen, Spotify  **  slides

  • “Take complexity for your developers” (more complex devops tooling, better abstractions can be worth it for a better DevEx)
  • Between the extremes “Complete Team Autonomy” and “Centralized Ops” is their happy medium: Ops (embedded) in teams, Core-Infra Org, Golden Path
  • “Establish trust through monitoring”
  • Metrics incl.: Status of backups
  • “If you don’t have restored backups, you don’t actually have backups”

Doing Things Prometheus Can’t Do with Prometheus – Tim Simmons, DigitalOcean **  slides

  • Metrics need to be Actionable, Contextual
  • Learn existing tools deeply – more valuable than shiny new ‘observability’ tools
  • Jeff Smith: “Maintenance is Revenue Protection”
  • Anomaly detection can be easily done with custom code, don’t always need a product with that feature

How Yelp Moved Security From the App to the Mesh with Envoy and OPA – Daniel Popescu, Yelp & Ben Plotnick, Cruise ** slides

  • OPA (Open Policy Agent) case study
  • OPA incl. unit-testability
  • OPA decision logs published to log collector (Splunk)
  • AuthN, AuthZ via Envoy sidecar
  • For projects like this, start from the use cases, be mindful of scope creep

Design Decisions for Communication Systems – Eric Anderson, Google  **  slides

  • Excellent historical context / pros & cons of Messaging mechanisms incl. gRPC, REST, Unix socket, TCP socket, older ones incl. DCOM / CORBA

Weaveworks EKS AppMesh Gitops workshop

  • Pretty well-constructed workshop targeting EKS & AppMesh with a GitOps workflow, using Flux as a k8s operator to promote container images
  • If you were at the conference, you got a nice Cuttlefish shirt with proof of completing this, pictured at the link above