Q&A with Jeff Smith on His DevOpsDays NZ Keynote on DevOps Transformations

Jeff Smith, manager of production operations for Centro, a Chicago based organisation which provides a platform for digital marketing, will deliver a keynote at DevOpsDays NZ this November in Wellington, New Zealand, with a talk titled Moving from Ops to DevOps: Centro’s Journey to the Promiseland. Smith also spoke at DevOpsDays Indianapolis in July about the misalignment which can arise in an organisation, simply due to the differing motivational lenses through which professional silos examine the same subject matter.

InfoQ caught up with Smith to discuss Centro’s journey and compare it with Grubhub’s DevOps transformation, which he spoke about in 2016 at DevOpsDays Minneapolis.

InfoQ: You talked at DevOpsDays Indianapolis this July about the misalignment which can arise from the differently biased “lenses” through which various parts of an organisation look at the same problem. Can you tell us a little more about this?

InfoQ: How did Centro deal with this misalignment in understanding? InfoQ: Your talk for DevOps NZ will be about Centro’s journey from Ops to DevOps. What do these two terms mean to you? InfoQ: How was this journey affected by Centro’s own particular starting state and context? InfoQ: What types of challenges did Centro and its teams have to overcome during this transformation? InfoQ: How has the culture at Centro changed since its DevOps transformation?

Smith: People want to know why a thing is being done. That’s probably the biggest change I’ve witnessed. The act of asking “why?” shows that there is a level of engagement that goes beyond just getting a request off their plate so they can move on to the next thing. When people don’t understand something, they ask good probing questions in order to understand something. One simple question about a failed job run might end with an in-depth discussion of how Write Ahead Log replication in the database works. People just naturally want to learn more!

InfoQ: What measures do you take to avoid reverting back into practice based silos?

Smith: Discipline on how we approach problems. A knee-jerk reaction when something goes wrong is to add another layer of approval or another layer of supervision, but it doesn’t solve any of the actual issues that lead to the situation you’re in. We also really enjoy doing post-mortems, but we focus on the human side of the problem rather than the order of events. What were people thinking when they made a particular decision? Why did that seem like a rational choice? What can we learn about the incident to make sure we haven’t slipped into an old way of doing things? We have to talk about things that go wrong in a much deeper sense than the way we typically talk about failure in retributive cultures.

InfoQ: How have non-technical partners responded to Centro’s journey to a DevOps culture?

Smith: Non-technical resources honestly don’t interface with us at the level where they see the cultural differences, but they see the change in capabilities. When a person can have their own mini-environment to test out a specific feature, it’s powerful. And when they can get it in minutes as opposed to days, it speaks to the types of changes and the speed at which we intend to move.

InfoQ: You previously spoke at DevOps Minneapolis about Grubhub’s DevOps transformation. How did these two experiences compare? InfoQ: Were there any particular lessons which stand out across both journeys?

Smith: People are always looking for a better way to do things. It’s not desire that stops transformations from happening, but fear. Fear of failure, fear of mistakes, fear of change. Ignoring fear is not a winning solution. Fear is like any other emotion, it needs to be acknowledged and accepted as real before it can be managed and dealt with.

InfoQ: What approaches have proven themselves to you in dealing with this general presence of fear?

Smith: The first thing you can do as a leader is to show vulnerability. You can’t be afraid to admit mistakes because your team and others around you will follow your example. Admit when you don’t know something in order to show that you don’t have the answers and that an inquisitive nature is not only allowed, but encouraged! Be curious about what others are doing in your environment and give them a chance to teach you something. Fear is rooted in a lack of trust, so before you can dispel fear, you need to build up trust.

Ultimately something will happen that is an incident or an outage. When that happens, look beyond the who did what when. Look deeper into the real root of the problem, not just that Frank restarted the nodes during the day. Why did Frank restart the nodes? What signals lead into his decision making? What signals were missing that would have informed Frank that this was a bad decision? How do you bring all of these things to light through training, through radiating information so that the next time this situation arises, we’ve addressed all the parts of the system that contribute to someone making an error? Once you demonstrate this willingness to indict not just the person, but all of the little factors that lead up to the event, you generate an open honest environment.

One last important thing, there’s a right way and a wrong way to fail. If you deploy some new code that’s showing signs of instability and we then need to roll it back with another deploy, that’s one way of failing. The preferred way is if we can feature flag code. If we can turn it on and off without deploying, but through configuration, that’s the way I want my organization to fail. Celebrate those times! When you have a developer that deploys codes, monitors the telemetry, sees the problem and turns the feature off, that’s still a win. You’re never going to eliminate failure. But you can eliminate failing poorly.

InfoQ: Which of the DevOps Topology Patterns best describe the model of DevOps adopted by Centro?

Smith: We’re currently a Type 1 DevOps organization with the goal of moving to a hybrid of Type 2 and Type 3. I’d love to live in a world where the OPS team is simply providing a platform for developers in an Infrastructure as a Service kind of model, with the sprint teams moving to a “you build it, you run it” kind of mode. There are a few hurdles here.

  1. We need to get to a place where we’ve codified the typical patterns that our applications have in terms of infrastructure, in order to reliably provide those services in an automated fashion. Sometimes your design can get too specific, so when someone needs to adopt a new service, the infrastructure template you’ve created isn’t compatible.
  2. We need to continue to level up developers on their understanding of the Operations point of view. It’s less about technical knowledge and more about thinking from the perspective of resiliency, failure tolerance and focusing on the critical path of your application. When you go to the Amazon home page, there’s a lot of separate services that are involved. But the only thing you actually need on that page is search and shopping cart functionality. Everything else can be short-circuited in the event of a failure to deliver those key features. Getting developers to constantly think in that mindset is an opportunity to get them thinking more in the Operations POV.

You can catch Smith’s keynote at DevOpsDays NZ in Wellington, New Zealand, as of 5th and 6th November.


Article by channel:

Read more articles tagged: DevOps