PushBackLog

Continuous Improvement

Soft enforcement Complete by PushBackLog team
Topic: delivery Topic: agile Topic: process Skillset: management Skillset: engineering Technology: generic Stage: operations Stage: planning

Continuous Improvement

Status: Complete Category: Delivery Default enforcement: Soft Author: PushBackLog team


Tags

  • Topic: delivery, agile, process
  • Skillset: management, engineering
  • Technology: generic
  • Stage: operations, planning

Summary

Continuous improvement is the ongoing, incremental process of examining how a team works and systematically making it better. It is not a quarterly initiative or an annual planning exercise — it is a regular, structured habit embedded in the team’s operating rhythm. Retrospectives, blameless post-mortems, and explicit action item tracking are its primary practices.


Rationale

Delivery processes that are never examined degrade over time. What worked for a team of four does not work for a team of twelve. What worked for a product with three users does not work for a product with three thousand. Without deliberate reflection, teams carry process debt that compounds as quietly as technical debt.

Continuous improvement is the feedback loop that prevents this accumulation. It creates a team culture where inefficiencies are surfaced without fear, changes are made incrementally rather than in disruptive overhauls, and learning from failures is an asset rather than a liability.


Guidance

The retrospective

Retrospectives are the primary mechanism for continuous improvement at the team level. A retrospective is scheduled at the end of each iteration or sprint and answers three questions:

  1. What went well?
  2. What could be improved?
  3. What will we commit to changing in the next iteration?

The third question separates a useful retrospective from a venting session. Without committed actions, a retrospective is theatre.

Effective retrospectives:

  • Are time-boxed (60–90 minutes for a two-week sprint)
  • Produce no more than 2–3 action items — focus beats volume
  • Assign a specific owner to each action item
  • Review previous action items at the start of each retrospective
  • Rotate facilitator to distribute accountability and perspective
  • Are psychologically safe — if the team cannot honestly share problems, the retrospective cannot function

Action item management

Action items from retrospectives live in the team’s backlog, not in meeting notes. Untracked action items are not commitments — they are aspirations. The team should review outstanding improvement actions alongside technical work in sprint planning.

Retrospective formats

A single format becomes routine and loses effectiveness over time. Rotate between formats:

FormatDescription
Start / Stop / ContinueWhat to begin doing, stop doing, and keep doing
4Ls (Liked / Learned / Lacked / Longed for)Captures both positive and aspirational signals
SailboatWinds = accelerators; anchors = impediments; rocks = risks
Mad / Sad / GladEmotional register; useful when team morale is a concern
Five WhysDrill into the root cause of a specific problem

Blameless post-mortems

When an incident occurs, a blameless post-mortem examines what happened without assigning personal culpability. Systems fail, not people.

A post-mortem document includes:

  • Timeline of events
  • Root cause and contributing factors (systemic, not individual)
  • What went well in the response
  • What could be improved
  • Action items with owners and due dates

Post-mortems are shared across the organisation. Circulation of post-mortems is one of the most effective learning mechanisms available — teams benefit from incidents they did not experience.

Kaizen mindset

Kaizen (Japanese: “change for the better”) is the philosophy underlying continuous improvement. The kaizen mindset holds that:

  • Small, incremental improvements compound over time into significant capability changes
  • Everyone on the team is empowered to identify and propose improvements — continuous improvement is not a management function
  • Improvements should be tried quickly, measured, and kept or reverted based on evidence

Common failure modes

FailureDescription
Retrospectives with no actionsGood conversation; no commits; nothing changes
Action items untrackedItems captured in meeting notes never reviewed; same issues resurface each sprint
Blame in post-mortemsRoot cause identified as “human error”; systemic causes ignored; fear prevents future honesty
Same retrospective format foreverFormat becomes ritual; team goes through the motions with pre-formed answers
Retrospective cancelled under delivery pressureThe sessions dropped first when time is scarce are the sessions most needed

Examples

A retrospective run-through

A two-week sprint ends. CI pipeline duration has crept from 6 minutes to 18 minutes over the last month. The team uses the Start / Stop / Continue format:

What went well (Continue)

  • Daily standups were focused and under 15 minutes
  • Pair programming on the tricky auth refactor caught three bugs before review

What could be better (Stop/Start)

  • Stop: Merging large PRs on a Friday afternoon
  • Start: Treating CI duration as a tracked metric with an alert threshold
  • Start: Time-boxing spike tasks to one day before escalating

Actions committed (maximum 3):

ActionOwnerDue
Add CI duration metric to team dashboard; alert if > 10 minMarcusSprint 14
Agree a team norm: no PR merges after 3pm FridayWhole team (Jordon to document)This sprint

These two actions go into the sprint backlog as stories with acceptance criteria. The retrospective at the end of Sprint 14 opens by reviewing whether they were completed.

Blameless post-mortem excerpt

Incident: API gateway returned 503s for 11 minutes on 14 Jan affecting ~800 users.

Timeline

  • 14:02 — Deployment of v2.4.1 completed, monitoring green
  • 14:11 — PagerDuty alert: error rate 8x baseline
  • 14:14 — On-call engineer determines cause: new health check endpoint returning 500 due to missing env var in production secret set
  • 14:22 — Env var added, service rolled forward; error rate returns to baseline
  • 14:23 — Incident resolved

Root causes

  1. Pre-deployment checklist did not include verification of new required env vars
  2. Staging env var set was complete (manually maintained); production set had diverged

Contributing factors (systemic, not individual)

  • No automated diff between staging and production environment variable sets
  • Health check error did not surface during deployment smoke tests because the new endpoint was not covered

Actions

ActionOwnerDue
Add automated env var diff check to deployment pipelineToddSprint 15
Add new health check endpoints to smoke test suiteChristySprint 15


Part of the PushBackLog Best Practices Library. Suggest improvements →