Input-> System-> Output: A paradigm for life

January 23, 2021
#thinking 

I came across an article that claimed a piece of analysis failed because of ‘bad’ data.

It reminded me of a phrase I used to hear on the trading floor, “US non-farm payrolls beat estimates”.

Is it possible for data to be inherently good or bad? I am not convinced. The state of data should only be that they exist, and are held as the truth because they exist. In other words, US non-farm payrolls didn’t beat estimates, your estimate was too low.

There must be another reason to explain the system’s unexpected performance. Consider the following paradigm.

input,system,output

When applied to a machine learning or data science project might give something like the following 4 steps.

ML system flow

Putting my paranoid/Trader cap on, 4 steps mean 4 modes of failure.

The rest of this blog shall pay close attention to mode 2. Otherwise labelled as DevOps, data architecture, data engineering. I never know which one to use so I’ll stick with ‘move data from there to here’.

The aim of this blog is to challenge myself and hopefully the reader to think about big data systems at a layer of abstraction that remains invariant in time.

Learn to fish and you’ll never be hungry, or something like that.

Stick around and you can expect to explore:

  • Database principles
  • Types of failure
  • Lambda architecture
  • Where today’s tools fit in
  • Lots of hand-drawn diagrams…no one does them anymore