Wednesday, October 9, 2024

Accelerate

2024-accelerate

Notes for the book Accelerate - The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, Jez Humble and Gene Kim.

Brief summary

The book describes the State of DevOps Reports research conducted by Google's DevOps Research and Assessment (DORA) team. The book goes through what they have found in the research and how they have done the research and presents a case of transforming an organization.

The book/research introduces four "DORA metrics" to use for measuring software delivery performance of an organization

  • Change Lead Time
  • Deployment Frequency
  • Change Failure Rate
  • Mean Time to Recovery (MTTR)

They also present 24 capabilities to drive improvements in software delivery performance, classified into five categories:

  • Continuous delivery
  • Architecture
  • Product and process
  • Lean management and monitoring
  • Cultural

For more details, see reference of 24 capabilities.

Part One - What we found

Chapter 1 - Accelerate

  • Focus on capabilities, not maturity
  • Focusing on a capabilities paradigm and right capabilities, organizations can continuously drive improvement.
  • The research has identified 24 key capabilities that are easy to define, measure and improve.

Chapter 2 - Measuring performance

  • First define a valid, reliable measure of software delivery performance
  • Challenges with many existing ways to measure
    • They focus on outputs rather than outcomes
    • They focus on individual or local measures rather than a team of global ones.

In search for measures of delivery performance to avoid the above challenges, the research settled on the following four

  • Change Lead Time
    • Arising from lead time in Lean theory
    • Here, the delivery part of the lead time, excluding the design part
    • The time it takes to go from code committed to code successfully running in production.
  • Deployment Frequency
    • Arising from batch size
    • Deployment frequency used as a proxy for batch size since it is easy to measure and typically has low variability.
  • Change Failure Rate
  • Mean Time to Recovery (MTTR)

Continuing with these metrics

  • Surprisingly, there is no trade-off between improving performance and achieving higher levels of stability and quality.
  • Figure 2.4: Software delivery performance impacts Organizational Performance and Noncommercial Performance
  • Important: Distinguishing which software is strategic and which isn't, and managing them appropriately.
  • IMPORTANT: Use these tools carefully
    • In organizations with a learning culture, they are incredibly powerful.
    • In pathological and bureaucratic organizations cultures, measurement is used as a form of control and people hide information that challenges existing rules, strategies, and power structures.
    • Deming: "Whenever there is fear, you get the wrong numbers"

Chapter 3: Measuring and Changing Culture

  • Organizational culture can exist at three levels in organizations: Basic assumptions, values and artifacts (Schein 1985)
  • The research uses a model of organizational culture defined by Ron Westrum.
  • Westrum topology of organizational cultures
    • Pathological (power-oriented, characteristics by fear and threat)
    • Bureaucratic (rule-oriented)
    • Generative (performance-oriented, mission-focused)
  • Westrum's insight is that the organizational cultures predicts the way information flows through an organization.
  • Westrum's three characteristics of good information
    • It provides answers to the questions that the receiver needs answered.
    • It is timely.
    • It is presented in such a way that it can be effectively used by the receiver.
  • Bureaucracy is not necessarily bad. The goal of bureaucracy is to "ensure fairness by applying rules to administrative behavior ..." (Mark Schwartz)
    • Westrum's rule-oriented culture is perhaps best thought of as one where following the rules is considered more important than achieving the mission
  • Figure 3.2: Westrum organizational culture impacts Software Delivery Performance and Organizational Performance
  • References to Google's Project Aristotle research on team performance (2015), "it all comes down to team dynamics"
  • Accident investigations that stop at "human error" are dangerous. Human errors should be the start of the investigation, instead.
  • How to change culture? John Shook on transforming the culture of the teams (How to Change a Culture: Lessons From NUMMI

What my NUMMI experience taught me that was so powerful was that the way to change culture is not to first change how people think, but instead to start by changing how people behave — what they do.

  • Figure 3.3: Continuous Delivery and Lean Management impact Westrum Organizational Culture.

Chapter 4: Technical Practices

  • Technical practices are an enabler of more frequent, higher-quality and lower-risk software releases.
  • Continuous Delivery
  • For the Figure 4.2, see Accelerate Digest / Impact of CD
  • Going through various technical practices.
  • Pick: Test automation is important
    • but having automated tests primarily created and maintained by a separate party is not correlated with IT performance.
    • Testers serve also an essential role performing manual testing such as by exploratory, usability and acceptance testing and helping to create and evolve automated tests by working alongside with developers

Chapter 5: Architecture

  • High performance is possible with all kinds of systems. provided that systems - and the teams that build and maintain them - are loosely coupled.
  • Situation likely at low performers
    • Software they were building was custom software developed by another company.
    • Working on mainframe systems. (Interestingly, integrating against mainframe systems was not significantly correlated with performance)
  • Importance of focusing on architecture characteristics rather than implementation details of your architecture.
  • Deployability and testability are important for creating high performance.
  • "Inverse Conway Maneuver" mentioned
  • The goal of loosely coupled architecture is to
    • ensure that the available communication bandwidth (between teams) isn't overwhelmed by implementation-level details but can be used for discussing higher-level shared goals and how to achieve them.
    • enable scaling

Chapter 6: Integrating Infosec into the Delivery Lifecycle

  • Arguably the DevOps movement is poorly named.
  • The original intent of the DevOps movement was - in part - to bring together developers and operations teams to create win-win solutions in the pursuit of system-level goals
  • Not limited to just development and operations, it occurs whenever different functions within the software delivery value stream do not work effectively together.
  • "Shift left" on security
    • Build it into software delivery process instead of making it a separate phase happening downstream in the process.
    • Impacts ability to practice continuous delivery
    • Shift from security teams doing reviews themselves to giving the developers the means to build security in.
  • Related: cloud.gov is now FedRAMP Authorized for use by federal agencies

Chapter 7: Management Practices for Software

  • Lean Management modeled to SW dev with three components
    • Limiting work in progress (WIP)
    • Creating and maintaining visual displays showing key metrics etc.
    • Using data from application performance and infrastructure monitoring tools to make business decisions.
  • WIP limits itself did not strongly predict delivery performance.
    • Only when combined with use of visual displays and having a feedback loop from production monitoring tools back to delivery business or the business.
  • Interesting quote of approval processes

External approvals were negatively correlated with lead time, deployment frequency, and restore time, and had no correlation with change fail rate. In short, approval by an external body (such as a manager or CAB) simply doesn’t work to increase the stability of production systems, measured by the time to restore service and change fail rate. However, it certainly slows things down. It is, in fact, worse than having no change approval process at all.

Chapter 8: Product Development

  • Eric Ries' Lean Startup mentioned
    • Synthesis of ideas from the Lean movement, design thinking, and the work of entrepreneur Steve Blank, emphasizing importance of taking an experimental approach to product development.
  • Figure 8.2: Lean Product management impacts
    • Westrum Organizational Culture, which impacts Organizational Performance
    • Organizational Performance (straight)
    • Software Delivery Performance, which impacts Organizational Performance
    • Less Burnout.

Chapter 9: Making Work Sustainable

  • Deployment pain/feat can tell a lot about a team's software delivery performance.
  • Fundamentally, most deployment problems are caused by a complex, brittle deployment process.
  • This is typically a result of 3 factors
    • SW is often not written with deployability in mind
    • Probability of a failed deployment rises substantially when manual changes must be made to production environment as part of the deployment process.
    • Complex deployments often require complex handoffs between teams.
  • Six organizational risk factors that predict burnout
    • Work overload
    • Lack of control
    • Insufficient rewards
    • Breakdown of community
    • Absence of fairness
    • Value conflicts

Chapter 10: Employee Satisfaction, Identity, and Engagement

  • Employees on high-performing teams were 2.2 times more likely to recommend their organization to a friend
  • Research recommending diverse teams: Rock and Grant 2016, Deloitte 2013, Hunt et al 2013

Chapter 11: Leaders and Managers

  • Transformational leadership
    • Leaders inspire and motivate followers to achieve higher performance by appealing to their values and sense of purpose, facilitating wide-scale organizational change.
  • Model for transformational leadership with five characteristics (Rafferty and Griffin 2004)
    • Vision
    • Inspirational communication
    • Intellectual stimulation
    • Supportive leadership
    • Personal recognition
  • Three things highly correlated with SW delivery performance and contribute to a strong team culture
    • Cross-functional collaboration
    • A climate for learning
    • Tools

Part Two - The Research

  • Presenting the science behind the research findings in Part 1

Chapter 12 - The science behind this book

  • Primary and secondary research
    • Primary research - collecting new data by the research team
    • Secondary research - utilizes data collected by someone else.
  • Qualitative and quantitative research
    • Research presented in this book is quantitative, because it was collected using a Likert-type survey instrument
  • Six types of data analysis (according to framework Dr. Jeffrey Leek)
    • Descriptive
    • Exploratory
    • Inferential predictive
    • Predictive
    • Causal
    • Mechanistic
    • (Analysis presented in this book fall into the first three categories)

Chapter 13 - Introduction to Psychometrics

  • Questions on the research: Why to use surveys, can you trust the data collected with surveys?
  • "Latent construct" is a way of measuring something that can't be measured directly
    • E.g. "organizational culture"
    • Help to think carefully what we want to measure and how we are defining our constructs.

Chapter 14 - Why Use a Survey

  • Discussion on surveys vs "system data"
  • Surveys allow you to collect and analyze data quickly.
  • Measuring the full stack with system data is difficult
  • Measuring completely with system data is difficult
  • You can trust survey data
  • Some things can be measured only through surveys.

Part Three - Transformation

  • Chapter by Steve Ball and Karen Whitney on leadership and organizational transformation

Chapter 16 - High-Performance Leadership and Management

  • Leadership has a powerful impact on results.
  • Component for sustaining competitive advantage (in addition to technical performance): A lightweight, high-performance management framework that:
    • connects enterprise strategy with action
    • streamlines the flow of ideas to value
    • facilitates rapid feedback and learning
    • capitalizes on and connects the creative capabilities of every individual...
  • Case study from ING Netherlands, some picks
    • You have to understand why, not just copy the behaviors
    • The work itself will constantly change; the organization that leads is the one with the people with consistent behavior to rapidly learn and adapt.
  • Summary at https://bit.ly/high-perf-behaviors-practices