12th March 2020: More thoughts on structuring info flows

It’s been an interesting couple of weeks, so I’ve gone back to form and bashed out some weeknotes for the last hour. The first two parts touch on some aspects of software engineering which might be of interest to senior developers, and people managing engineers. The third part describes what we’re up to around Coronavirus preparations and remote working, but is more about how to deal with information in a crisis.

Oh yeah, here’s my personal intro in case you’re interested.

Engineering is cartography

It’s been a code heavy week, down in the bowels of the codebase. Last week I’d expected half-expected to be managing the work for the team bringing together a new, complex release, but the team have stepped up and sorted this out. It’s actually been really nice to get my teeth into some proper engineering. Headphones on, Slack out the way, IDE in full force. There are moments when software engineering is that odd cross between the Matrix and an artist’s studio, and one can absorb oneself not in code, per se, but in a system. The joy is not the code, but in the flow of information that the code allows, how the assembled parts work together to make something elegantly smooth, and what I now think of as the ch’i of circuitry emerges into something stable, formed, useful.

Specifically (here comes the science bit) I’ve been adding a cache file to the code that runs every time a page is loaded, and have naturally hit two of the chief tenets of software engineering, namely:

Cache invalidation is always harder than you figured, and
Developers (yeah, yeah, including me) always expect things to be easy.

There were a couple of hitches – the first was a foreseeable one (wow, different sites use different folders, duh), while the other was less so (Linux folders are case sensitive, but Windows ones aren’t, riiiight). Still, both fairly minor, and when you have the problem space “loaded into your head”, a good developer should know where to hit it.

I think of this as “interjection points”. A good software engineer will, I believe, not give you a time estimate to fix something, but a time estimate to map a system out in order to understand it. If the estimate to fix it doesn’t fall naturally out of that, then the engineer needs more sketching experience, or your codebase is about to be unmaintainable realllllly soon. Pray it’s the former.

Agile development requires continual balancing

After the caching was committed and tested, I fixed up some other fundamental code which was breaking our unit tests. While the code structure itself was ok, our outdated phpunit version wasn’t happy with it, so it was a mix of a bad stack, dodgy unit testing, and slightly whiffy software engineering. Each, any and all of these three are quite hard to explain, it turns out. “Upgrading some software” is the easiest, but led to a big story-point estimate during sprint meeting, but I managed to set out a day to look into it, to see if we could get a quicker fix through Re-Engineering the code.

In the end, yes upgrading phpunit is the right thing to do, but this was the wrong moment. A ‘workable fudge’ was put in that half-hit all 3 of the requirements (code works; unit tests run; engineering is sane), with the caveat that it would be taken out with the upgrade, expected in a few months.

This has taught me a few things:

Acknowledging a bigger issue and having a timescale for fixing it allows you to have more authority when putting the problem off in the short term.
Sometimes you just have to ask for time, instead of trying to explain yourself. I’m pretty sure I said that recently too.
Software development is always a balance, and always has a context of delivery. This constant act of balance is fundamental to agile development on short timescales. FUNDAMENTAL.

Anyway, we’ve saved ourselves the huge upgrade for now, so it was a good morning’s work.

Reacting as management to a crisis

Those two coding tasks were basically my commit for the sprit, which freed me up a bit. I had (or have, still) varioius admin things to do. But Wednesday-leading-into-Thursday started the government and media ramping up on taking the Covid-19 pandemic more seriously, and it became clear that we needed to ramp up our own precautions and efforts to match. As a company, you need to follow the course of the news sometimes. Is it a “crisis”? I don’t know, but I do think that if people are worried, you should treat it as one, but you should desperately try not to panic yourself.

In general, we’re lucky to be a digital company (whereas others are not so – most of us can and do already work remotely, so we have a lot of infrastructure in place, and had cobbled together some general thoughts on remote working more some months ago.

However, the current scenario throws in a few extra challenges, namely:

Information updates fast, in both the wider world and inside the team
We may be ‘hyperdisrupted’, in that nobody may be allowed into the office at any time
People are personally, individually more aware and anxious, for good reason

I’ve learned (sometimes the hard way) that a crisis needs clarity, not vagueness, and the best thing you can do to help people get a sense of psychological safety is to let people raise concerns, get clear actions in place, and be transparent. I’ve not done these in previous times, and it tends to end up pretty badly. Things might ‘survive’ in a holding pattern for a while, but it leads to a lack of trust, and ineffective collaboration, even before the illness actually hits.

(I do wonder if some mentalities prefer the vagueness of an ongoing crisis, the idea that there is a situation or an “enemy” to feel like there is a purpose, but that there is no real will or skill to turn that into something more progressive and real. Ah well, I enjoyed a day of contingency planning anyway.)

As per software engineering, I’ve been thinking through information flow and how the structures that get set up support and/or hinder that. My overall strategy has been along the lines of:

Gather concerns and blockers: Start a Slack channel. Ask initial questions about basic challenges, like working from home. Send out a survey using Google Forms. The key thing here is to start a pipeline for information, not to capture every single possible problem.
Capture everything clearly: Started a new Google Doc to cover guidance, decisions, questions, ongoing actions – half for me to keep track of things, but half as an instant information-publishing route. Put the important stuff to the top, and don’t be afraid to remove or move it once it’s no longer important – for instance, washing hands and getting anti-bacterial gel started off as the top priority, but now it’s a matter of course.
Work out how to make decisions quickly: This is not something we have an assigned role for. We adopted a ‘rule of 3’ among the (4) senior management team members, taken from Rick Falkvinge and the Pirate Party as an approach to avoid hierarchical bottlenecks.
Capture decisions relentlessly: People are away a lot, which is a real challenge if things are moving fast. I’ve taken to adding all decisions into the single Google Doc as appropriate as well as adding a comment to note which day the decision was made on, as well as including the decision in a separate “update log” at the bottom of the doc. Hopefully this way I can just point people at these if/when they want to catch up. And then use Slack and daily stand-ups to communicate important (team-wide decisions out. Otherwise, make sure relevant people are told individually.
Address actions quickly: Ultimately, trust in management comes down to whether concerns are listened to, risks are seen to be understood, and both are addressed proactively. (Unforeseen events are OK to be addressed reactively, but only if they were truly unforesee_able_. As a kicker, events are always more foreseeable a) if you’re not already responding to a dozen other things, and b) in hindsight. Tchah.)

So, that’s been fun. My main disappointment so far is not being able to think up a decent codename for the day for trialling everyone being remote, except for ‘TOTAL REMOTE’. I’m sure something will come along to best that though. I can feel hubris in my bones. Til next time.