How do we think about the datas?

Disclaimer: This is not thought-out blog post like you might find on the real internet. This post is fueled by a few extra hours of sleep, black instant coffee, and a general interest and enthusiasm for the data debate which I haven’t had in years. This is a post to sort out some of my own thoughts before being able to engage more fully.

I also really really liked the idea of writing a blog post in response to a blog post. It feels like 2005 all over again.

Context: Dan Barrett, UK Parliament’s Head of Data and Search, has posted a couple of posts about the difficulty of talking about data – part 1 and part 2, with ensuing Twitter conversation.

This gave me wobbly memories going 7 years back, when ‘open data’ was getting ‘hot’ and trying to find its way in the world. Self-interested plug: I went back and dug out my old post, “Open Data” needs to die to see if it was still relevant. Some of it seems to be, namely the need for context, the semantic quibbling that goes on.

(The idea of “agile data” is a new idea for me though – this makes a lot of sense for my own job, and has a rich depth waiting to be explored.)

Perhaps this bit from the old post gets at some of the difficulty:

Many people with useful, everyday data and databases really don’t think in terms of data. Because the data is about stuff they know, they think of it as “information”. Maybe even a “resource”. But ask them what “data” they have and they’ll probably give you a back-up of their website.

Is the term ‘data’ just too vague to be useful? If you had a magazine all about Data, then what would it cover? Databases, database design, relational data, non-structured data, data dumps, big data, personal data, data security, — yeh, even my eyes are glazing over with the word now. Would I buy it? No, probably not.

Does the word “data” need to die?

No – emphatically not. I think it does mean something to me, as a computer scientist engaged in data on a daily basis. It is the raw material that underpins everything I do. BUT it’s hard for me to say that ‘data’ is this, that or the other.

On a daily basis, data for me covers not just the stuff we make available on Local Insight, but anything we’ve decided to commit to a database for the purpose of structuring it, processing it, linking to it, etc. As a second order, I also consider all of our files to be data, just structured in a different way.

So maybe there are two ways that we think about data (and yes, I think a lot of the confusion is now just how we explain what data is, but how we make sense of the term. And this is, in a sense, just a semantic argument. But it needs to be a semantic argument if we’re talking about how to talk to each other). Two ways that often conflict with each other:

  1. Bits stored in computers. As in, 0s and 1s that give the computer something to do. Data in its ‘purest sense’. This is so generic it hurts, and yet it forms a useful distinction between analogue processing, which, let’s face it, is pretty much how humans like to think. “It looks like rain” is much easier to think than “It has 47% chance of raining.”
  2. Structured information. This is different to ‘pure information’ – it is taking the content that underpins information, and gives it structure – shape, consistency, and something predictable which allows us -and others – to work with it more easily. At this point, this stuff that sits at the point that information and data intersect has become the stuff of science.

This distinction is possibly useful because everyone has different backgrounds – data and computing and digital and tech are still really divided when it comes to skillsets aross society as a whole.

I don’t like putting people into one camp or the other, but broadly speaking, I think it takes fairly specialist skills to understand Data as #2, whereas having a vague idea of #1 is more of a default, and without that training, it’s easy to think of structured data as just 0s and 1s.

OK I’m out of words for now. I wanted to get this down as a thought-clear cos I think there are some really interesting questions coming out of it, and I also want to go back and re-read the other points floating around.