
Data Product Excellence
Notes on systems and design in data-rich environments
About

I work at the intersection of data, product, and systems.Over 15+ years, I’ve helped fintech leaders and mission-driven teams build lasting data capabilities through user-centric design, product focus, and thoughtful architecture.
Writing
Data as a Complex but Incomplete TruthAnyone who has ever made a pivot table has thereby encountered epistemology - the philosophy of knowledge. Every dataset, no matter how big, turns out to paint only an approximate picture. One problem is that it leaves out a lot. But even the part that it does capture, needs to be filtered and summarized before it can be uploaded into a human mind.It becomes crucial to think about the lens with which to see it. Should you create categories out of a continuous column? Should you just depend on the mean? Are there outliers? Even if there are no outliers today, can there be outliers tomorrow? What should you do with the outliers? Aren’t outliers the whole point of studying a pattern? And if there are no outliers today, how to even begin to decide what an outlier could look like tomorrow?One can find it very easy to be carried away. This is why data science and data engineering are two separate disciplines. The engineer doesn’t get carried away, he is a plumber at heart. The water is either stationary or it's flowing, it’s either leaking or making it through to the other side. The scientist, on the other hand, needs room to speculate, to be creative with his hypotheses, to worry about where to draw the line.This is not to say that the scientist is somehow superior to the engineer. In fact the opposite. In my own journey I have been an analyst, then a scientist, and finally an engineer. The analyst and the scientist can be ignored, but you need the engineer. There is a quiet dignity in reporting things exactly as you found them, without loss or embellishment.There is also the matter of ethics. Unfortunately it is talked about a lot more than it is practiced. One can argue that our ongoing enslavement by algorithms was brought upon by brilliant but unthinking data scientists, rather than the data engineers who merely captured the true essence of our behaviour and passed it on to whomever claimed access.Why Centralized Systems FailI want to debunk the often repeated idea of building a “central repository” with “all the information in one place” and a structure so well thought out that it would be “easy to find anything and everything”.The fact is, I have no faith in monolithic systems that promise the world. They take a lot of upfront investment, and building them requires you to imagine a lot of stuff that may or may not occur in reality. Once built, such systems become ridden with hardcoded bureaucratic business logic that forces users to stick to a workflow long after it has stopped being useful.A ready example is the centralized train booking system, IRCTC. There are thousands of trains and millions of users. One could imagine lots of real world scenarios - frequent travelers, first time travelers, overnight, groups, multiple stops - booking two weeks or two hours or four months before the travel - but the expectation is that the same database and the same user interface and the information architecture is going to serve them all.Some may say, but it does solve a problem at a massive scale. To which I say, yes, but at what cost? The frustration of working with a tool that does not fit its purpose is a very specific malaise of our modern times. We encounter these tools on a daily basis - from our banking portals, to the intractable menus of our cloud service providers, to controls on content, many aspects of our modern lives appear to be designed for someone like us, but not us exactly.I believe that the building of tools, especially data-intensive software tools, need not be rooted in big assumptions and big investments, but can be done iteratively over a period of time. It does require data product builders who are in close touch with their users, as well as users who understand the process of building iteratively and can participate meaningfully in such a process.At its core, a data product is nothing but a mission to find the most useful version of a complex truth, and it needs a discerning user as much as it needs a data engineer or a data scientist.Building with EmpathyThe path from building useful things for yourself, to building useful things for others, goes through empathy. All the literature on design thinking, like all the literature on product management, emphasizes empathy. Empathy is the ability to put yourself in your user’s shoes.Empathy asks us to do the impossible - to not see the world from our own vantage point, to forego our own role at the center of the universe. How can we possibly feel what someone else is feeling, unless we are their conjoined twin?Perhaps they mean it in only a practical sense and not in a literal sense. They are asking you to try your best to be empathetic. It's not so much that you start seeing things from your user’s perspective, but that you stop seeing things from your own perspective to such an extent as to make your conclusions completely subjective.Hard as it is to empathize with one person, it is downright impossible to empathize with many. This is why, the bigger the scale of a product, the harder for its builders to claim empathy with their users. Can we claim the product builders at Instagram and Tiktok empathize with their users’ addiction to doom-scrolling?Perhaps it’s a matter of ethics. Stephen Batchelor, the secular Buddhist thinker and author of Buddhism Without Beliefs and Confession of a Buddhist Atheist, says that “Empathy is… the recognition that another person is as real as I am.” It is contingent upon us, as moral beings, to understand the pain of others the way we understand our own pain. Building useful things can be a way to connect with our users in a deeper and more authentic manner. Empathy is born out of connection, not the other way round.For someone who works with data, someone who is trying to create the most useful version of the truth, empathy involves finding out what it is that the user seeks to know, and will benefit from knowing.Limits of Gathering User Requirements in Data Product DesignMany in the software engineering business have found a purportedly more “objective” way of finding what the user needs - go and ask them. Millions of hours have been spent in the gathering of requirements. This was found to be way harder than it sounds. People often don’t know what they need, or have trouble articulating it. They have to imagine a solution that they do not have the expertise to build themselves. Requirements invariably borrow from things that the users have seen before, curtailing creativity. Oftentimes the imagined solutions are not feasible within the constraints of cost or time. Sometimes the imagined solutions are just impossible wishlists.When it comes to data products, gathering requirements becomes even more of a fool's errand. An analyst asking a manager for what a dashboard should look like, is like a sherpa asking a mountaineer for a map.A consultative approach is needed, to be sure. It’s a matter of connecting with the user, and understanding their pain. But you must stop short of asking them what medicine they want you to give them.It’s also a matter of agency. Whether the analyst sees themselves as a sherpa (or as a doctor, in my mixed metaphors above) is whether they feel confident in their ability to shape outcomes. They can’t merely be a provider of a service.In other words, they need to have influence over the user, through their products, but hopefully beyond that too.Lean Data Products: MVPs for Complex TruthsIf you have ever come across the idea of an MVP - a minimum viable product - you have been exposed to the central premise of the lean startup movement. The most typical adherent of this philosophy is the cash-strapped tech founder. An MVP is a way to test if a problem is real, if its solvable, and if anyone’s interested in it at all.Before the concept of a minimum viable product took hold, the leading methodology for software building was (and still is, to a very large extent) “agile”. Agile practitioners, like Eric Ries, believe in not making too many assumptions but seek constant feedback. They aim to develop a system where product enhancements are constantly being shipped, often in two-week “sprints”. This forces you to think of the most atomic part of your task, something that can be accomplished within one sprint.Agile has its many problems. Most of these have to do with the fact that it's near-impossible to be agile if you are a big group of people.Agile, in practice, becomes a method for coordinating a team, rather than supplying value to the user in an empathetic way. So the best product teams are purposefully small, because small can be agile.For a small team, agile product building becomes synonymous with building a series of MVPs. Most of these MVPs will fail but those that succeed will see more investment and more excitement.The fact that data teams are inherently small, dovetails well with the principles of agile. Data practitioners looking to produce the most useful version of the truth for their clients can therefore be highly successful by building a series of MVPs in quick succession. With the right habits, they can do so sustainably over a long period of time.
Get in touch
If you're working on something messy with data, I'd love to hear about it.
Voicebot Auditor
Independent evaluation of conversational AI systems
We audit voicebots for hidden risks such as bias, toxicity, empathy gaps, inaccuracy, and compliance blind spots
More specifically, we build and run evaluation harnesses, structured tests that simulate real calls at a fraction of the cost, and measure performance across dimensions like safety, empathy, and accuracy.- Custom prompt libraries tailored to each domain (especially fintech and heathcare)- Metrics you can track as your own system evolves- Independent scoring, separate from your engineering team
Who It’s ForIf you are operating an AI voicebot in a regulated environment, if you are iterating rapidly and want to be sure that you are not causing harm in the process, or if you worry about the fickle nature of the LLMs powering your interactions with real human beings, get in touch to sleep better at night.