KubeCon 2019 insights

Attended KubeCon 2019 in San Diego last week. Great conference, liked the 30min prezo format, got more out of this one than any I’ve attended. And, late November is a great time to be in southern CA, though it’s almost disorienting to see the sun every day.

Key Takeaways

  1. Teams should be structured to minimize cognitive load in order to maximize effectiveness, “minimize cognitive load for others”, “use small, long-lived teams as the standard” – from the Team Topologies book authors
  2. DevEx (Developer Experience) term (new to me) used by the Team Topologies book authors – though this is easily confused with devx.com, I like it. “DevEx must be optimized to maximize feature flow”
  3. CD platform Spinnaker has really taken off with wide adoption and a healthy ecosystem – Spinnaker Summit, now in its third year, is now 500+ attendees over three days, just prior to KubeCon  / CloudNativeCon. Armory offers a hosted Spinnaker solution. A couple of my ex-co-workers have presented at Spinnaker Summit and are active members of the community: Joel Vasallo, Steven Basgall
  4. Serverless Frameworks can mean different things, ranging from as simple as scale-to-zero to functions-as-a-service. Examples of the latter are OpenFAAS (which supports k8s, Swarm, ECS / Fargate) and vendor-specific Lambda, Google Cloud Functions, Azure Functions; of the former, Knative (which is more of a serverless building-blocks technology which appears destined to be commonly-paired with k8s)
  5. Platform as a Product leads to a reliable platform
  6. Observability is largely about being smarter re unanticipated failure events (the “weird zone”) and refining craft in using the monitoring / alerting tooling one already has
  7. Security requires ongoing effort, “treat vulnerabilities like earthquakes” (they’re going to happen, maximize resiliency / recovery)
  8. Kubernetes (k8s) ecosystem is strong and getting stronger, many using it in production, GKE currently more widely used than EKS, but EKS is certainly working for some. “Kubernetes is the OS for the cloud”
  9. Service Mesh delivers the most benefit when most or all applications are on a service-mesh-ready deployment technology (incl. k8s, AWS ECS / Fargate). There are competing service mesh solutions (incl. Istio, Linkerd, AWS-specific-AppMesh, a la carte Envoy, and others), with perhaps no clear leader yet (though Knative may drive Istio adoption). “Service mesh does for service-to-service communication what Kubernetes has done for orchestration”

Favorite talks / workshops

The Elephant in the Kubernetes Room: Team Interactions at Scale – Manuel Pais, Independent (co-author of “Team Topologies”)  ** slides  **  Team Topologies book **  teamtopologies.com  **  How Airbnb Simplified the Kubernetes Workflow for 1000+ Engineers

I’m a third  of the way into the book, best I’ve read since Lean Enterprise. Really well-written, just the right balance of theory / case-study, and the diagrams are well chosen.

  • DevEx == Dev Experience – simplify, simplify, simplify
  • Platform as a Product – leads to a Reliable Platform
  • Cognitive Load – minimize it within any team, and optimize DevEx by minimizing it for Dev team customers (by providing easy-to-use abstractions)
  • Platform Team: Platforms fit for purpose, optimized for DevEx
  • Primary 3 team types: Stream-aligned (feature delivery), Enabling (DevOps embedding / support for stream-aligned), Platform
  • Platform should make it easier to do the right thing, encouraging dev teams to use the platform and not diverge and be on their own
  • Kubernetes should be a hidden impl. detail with the Platform providing abstractions for good DevEx

OpenFaaS Cloud + Linkerd: A Secure, Multi-Tenant Serverless Platform – Charles Pretzer, Buoyant & Alex Ellis, OpenFaaS, LTD  ** slides

  • OpenFAAS is “Serverless 2.0” (Any code, Anywhere), vs. “Serverless 1.0” vendor-specific platforms incl. AWS Lambda, Google Cloud Functions, Azure Functions
  • Anywhere means k8s, Swarm, ECS / EKS, Datacenter, Local
  • Two options: OpenFAAS, OpenFAAS Cloud which bundles Git-based CI / CD, Runtime secrets, OAuth, Linkerd-based Service Mesh
  • Simplicity (short stack.yml + handler.js for a simple node.js example, vs. 6 distinct longer configs for k8s). Note OpenFAAS Dockerfile is optional but is supported
  • Linkerd (bundled with “OpenFAAS Cloud”): Only service mesh currently in CNCF (incubating): Actionable metrics, Deep runtime diagnostics, CLI-debugging, 60sec install, lightweight. Traffic-splitting, mTLS, Dashboard showing routing paths.

No-Nonsense Observability Improvement – Cory Watson, SignalFx ** slides

  • “The Normal Zone” includes monitoring for Anticipated behaviors
  • The “Weird Zone” is about Observability of Unanticipated behaviors
  • Observability will be one of your most expensive projects
  • Incident Measures++ traditional ones incl. MTTD, MTTR can be lame – instead look to Nora Jones Cyclic Approach: Difficulties in Understanding, System-specific failure rates, Surprises, Lack of ownership, Near misses
  • Automation – need to avoid human being “out of the loop”
  • Invest in risk and need
  • Understand the use cases

Making an Internal Kubernetes Offering Generally Available – James Wen, Spotify  **  slides

  • “Take complexity for your developers” (more complex devops tooling, better abstractions can be worth it for a better DevEx)
  • Between the extremes “Complete Team Autonomy” and “Centralized Ops” is their happy medium: Ops (embedded) in teams, Core-Infra Org, Golden Path
  • “Establish trust through monitoring”
  • Metrics incl.: Status of backups
  • “If you don’t have restored backups, you don’t actually have backups”

Doing Things Prometheus Can’t Do with Prometheus – Tim Simmons, DigitalOcean **  slides

  • Metrics need to be Actionable, Contextual
  • Learn existing tools deeply – more valuable than shiny new ‘observability’ tools
  • Jeff Smith: “Maintenance is Revenue Protection”
  • Anomaly detection can be easily done with custom code, don’t always need a product with that feature

How Yelp Moved Security From the App to the Mesh with Envoy and OPA – Daniel Popescu, Yelp & Ben Plotnick, Cruise ** slides

  • OPA (Open Policy Agent) case study
  • OPA incl. unit-testability
  • OPA decision logs published to log collector (Splunk)
  • AuthN, AuthZ via Envoy sidecar
  • For projects like this, start from the use cases, be mindful of scope creep

Design Decisions for Communication Systems – Eric Anderson, Google  **  slides

  • Excellent historical context / pros & cons of Messaging mechanisms incl. gRPC, REST, Unix socket, TCP socket, older ones incl. DCOM / CORBA

Weaveworks EKS AppMesh Gitops workshop

  • Pretty well-constructed workshop targeting EKS & AppMesh with a GitOps workflow, using Flux as a k8s operator to promote container images
  • If you were at the conference, you got a nice Cuttlefish shirt with proof of completing this, pictured at the link above

Whale riding Docker in a sea of Microservices

make development more consistent and deployment more reliable


Saw a couple interesting talks on Docker / Microservices last week – “State of the Art in Microservices”, the DockerCon Europe 2014 keynote, by Adrian Cockcroft ; and “Docker in Production – Reality, Not Hype”, at the March-2015 DevOps-Chicago meetup, by Bridget Kromhout (links below).

Adrian’s Microservices talk was interesting in that it was not limited to the purely technical realm of Microservices and Docker, but also described the organizational culture and structure needed to make it work:

  • Breaking Down the SILOs – a traditional “Monolithic Delivery” team must interface with each of 8 autonomous silo groups in his example, often using ticket-driven, bottleneck-prone workflow, vs. having two cross-functional “Microservices” teams (Product Team, Platform Team) which each span formerly-silo’d areas of expertise – making the point that introducing these DevOps-oriented cross-functional teams is a Re-Org
  • Microservice-based production updates may be made independently of other service updates, facilitating continuous delivery by each Microservice team and the reduced-bottleneck, high-throughput that results from it – contrasted with Monolithic Delivery deployments, which work well only with a small number of developers and single language in use
  • Docker containers facilitate the above by isolating configurations for each Microservice in their own containers, which are resource-light and start in seconds (and might live for only minutes), vs. a traditional VM-based approach which is more resource-hungry, starts in minutes and is typically up for weeks
  • Microservice Definition: Loosely coupled service oriented architecture with bounded contexts – this is the most succinct definition I’ve seen,  contrasted with the broader SOA term which can describe either a loosely or tightly coupled (often in the form of RPC-like WSDL / SOAP implementations) – loose coupling is essential for the independent production updates mentioned above, with bounded contexts (how much a service has to know about other services) an indication of loose coupling. A common example of tightly-coupled system is a centralized database schema, with the database being the “contract” between two or more more components
  • AWS lambda is an interesting service that scales on demand with 100ms granularity (currently in preview)
  • Example Microservice architectures shown for: Netflix, Twitter, Gilt, Hailo
  • Opportunity identified – of Docker Hub as an enterprise app store for components
  • Book Recommendation – Lean Enterprise: Adopting Continuous Delivery, DevOps and Lean Startup at Scale

Bridget’s talk about how DramaFever uses Docker in production (since late 2013) described some of the benefits of using Docker:

  • Development more consistent – when developers share docker containers for their environment, it both reduces friction during development and eases deployment handoff to shared-dev, QA, staging, production environments. Another side benefit is a production container can be easily and quickly pulled by a developer to a local environment to troubleshoot. In their case they went from a 17min Vagrant-based developer setup (which also differed from production in its configuration) to a < 1min Docker-based one
  • Deployment more repeatable – scaling via provision-on-demand may be done more confidently and in a more automated fashion knowing that the containers are correct. They take the exact image from the QA environment and promote it to Staging then Prod

… and some technical details / challenges:

  • Docker containers in the build pipeline – Docker base images (main app, MySQL emulation of AWS-RDS) built weekly,  and Microservice-specific builds of Docker containers dictated by the Dockerfiles in Git source control – she heavily emphasized the importance of a build-server-driven build and deployment pipeline (Jenkins in their case), the importance of having a fully-automated build and deploy chain (no laptops in the build pipeline)
  • Monitoring beyond the high-level offered by AWS CloudWatch implemented via Graphite, Sentry
  • Fig (now named “compose”) used to help containers find each other
  • “Our Own Private Registry” – they found it worked best to run local registries on each workstation rather than a centralized private registry
  • “Getting the Logs out” – host filesystem may be mounted from within the Docker container, to facilitate log export
  • “Containerize all the things” – they use Docker for most things, but have found Chef more effective for some of the infrastructure pieces such as Graphite. As she put it, you need to decide “what do you bake in your images vs. configure on the host after the fact”
  • “About those Race Conditions” – they use the Jenkins “Naginator” plugin to automatically re-run jobs which fail with certain messages such as “Cannot destroy container”

I’m looking forward to leveraging Docker to help optimize the deployment process for my current project, which will become even more important as we move toward a more Microservice-based architecture.