When we estimate work items (low level user stories, tasks etc) we estimate effort in hours as well as duration, not story points. By the time we do this execution planning estimation, we have a fairly good idea of what type of skill level will be doing the work and care about the load on the person and the team so the hours estimation is important to us.
We estimate duration separately from effort because we understand that our work does not involve digging trenches. Some items may only require 8 hours of effort but there is just no way that anyone sits down and executes for 8 hour straight. Besides general (bad) interruptions, humans switch tasks because the variation gives them an emotional and mental break. The work also requires them to solve problems and they need breaks to incubate lower level thoughts and decisions, get feedback from others and achieve certain mental breakthroughs. Some 8 hours tasks can be done in a day but it’s much more common to perform the 8 hours of effort over 3 days (while doing some other work in between). We care about this effort/duration combination at the low level of planning because it helps us create more realistic expectations about when things will be delivered especially because of the complex interdependent nature of software delivery. Very often it really matters whether the App developer will get the YAML for a new API by a certain date so that they can start their development in order to be ready for a tester to test by another date. Call us traditional but we see many delivery issues at other companies and our clients because of the absence of this basic traditional old school planning rigour.
From a control point of view, realistic effort and duration estimates also provide broader observability and less plausible deniability when someone does not deliver so that the relevant intervention can be taken. The feedback cycle becomes immediate and ongoing and impediments also become visible earlier because something must have impeded the delivery. However the effort and duration estimates don’t become and blunt object to blindly whip knowledge workers with, this is counter productive. Done right (not too far in advance and not on too large poorly defined work items) it helps create intensity without creating tension, a cultural norm we value.
We find the underlying principle behind story points useful for expressing a gut feel of the complexity on large and/or complex items, so not user stories but rather larger initiatives, opportunities, feature sets so “story” point is a bit of a misnomer but is still commonly used. Sometimes we do need to think about duration and delivery timelines on large initiatives and then we talk about high level durations but whenever we are in a position where we have to compare and rank a list of larger more complex initiatives (not stories), we find it useful to apply the concept of story points which we just call a complexity score. We don’t use t-shirt sizing because a linear scale such as small medium large does not allow for relative comparison and most things just end up being sized as medium. The Fibonacci sequence commonly associated with story points remains useful for our complexity score so our items will typically have complexity scores of 1, 2, 3, 5, 8, 13, 21, etc.
The size of items we think about in terms of complexity score is never something that could be delivered in a few days or even two weeks. We don’t believe in 2 week time boxes anyway, it’s simply not enough time to go from concept to click in our experience and defeats the purpose of realistic time boxing.