What is Enterprise Data?
Early in my career, I heard the phrase, “Data is the lifeblood of an organization.” And in spite of how much I loathe that phrasing, it has somehow found a way to stick with me all these years.
Why does it stick with me? Because it’s both true and bullsh*t at the same time.
It’s true that data is the “lifeblood” of enterprises in that it’s nearly impossible to conduct business in the modern world without touching or transmitting electronic data in some shape or form.
It’s bullsh*t in the sense that even though it may be technically correct at some level, in the 30+ years I’ve been doing what I do, very few organizations I have worked for or with have ever wanted to tackle their enterprise data issues in a meaningful way.
Organizing data across the enterprise is like herding cats across an endless prairie littered with landmines.
It exists inside of systems you know about and manage. It exists outside of them in spreadsheets and other places you don’t want it to. It exists in places you never would have thought in a million years it could exist in.
Enterprise data exists everywhere and nowhere. It lives in silos. It lives in warehouses. It lives in lakes with cleverly named “lake houses” adorning their glistening digital shores.
Data, data, everywhere. Unorganized, uncontrollable, unreliable data. Data that, if left to decay, will create stress and friction for enterprise system users.
Data that decays in disconnected databases, locked away in rigid legacy architectures that can’t possibly handle the needs of globally networked markets and society, let alone the explosion in data requirements being brought to the forefront by the siren song of leveraging AI in the enterprise.
I get it. It’s both hard and necessary. And I’m sorry if that doesn’t jibe with what you’re being sold by the AI Bros™.
Let’s rip the band-aid off and get to it.
Key Concepts of Enterprise Data
When I speak of enterprise data architecture, I am focusing on concepts such as:
- Big Data
- Customer Data Platform (CDP)
- Data Analytics
- Data Governance
- Data Lineage
- Data Mesh
- Data Modeling
- Data Storage and Management
- Master Data Management (MDM)
- Metadata Architecture
- Real-time Data Processing
Each one of these is a discipline in itself, but when I reflected on the idea of ‘Enterprise Data Architecture’ I found this to be a comprehensive grouping that gives a holistic sense of how an enterprise values, stewards, and leverages its data.
What Are the Problems with Enterprise Data?
This may come as a shock to you, but we are living through a really weird period in time right now. ;)
The hype of the AI fever dream - that doesn’t seem to want to break - has somehow convinced the folks in charge of stuff that a technology that requires ultra-high quality data will somehow miraculously fix the underlying issues with this data that have been punted on for years, if not decades in many cases.
And even though I am bullish on the longer-term implications of the Generative and Agentic AI concepts, the state of enterprise technology is fire right now, and I am going to try to be the adult in the room and focus on fixing real human and business problems that are holding us back from evolving to bigger and better things.
This was my brain dump of problems that we need to focus on with data in the enterprise.
Siloed Data
When I speak of ‘siloed data’ I am not necessarily speaking of a technology problem.
While enterprise data capabilities and capacity have grown exponentially over the years, the nature of the Enterprise Beast™ has not - what typically gets thought of (or blamed for) as a tech problem usually ends up being a people or process problem at the core.
When we see siloed data, it’s typically in organizations with little coordination between, or oversight over, the various domains (read: fiefdoms) that make up the organization as a whole.
And Conway’s Law isn’t exclusive to code:
Any organization that designs a system will produce a design whose structure mirrors the organization’s communication structure.
So before we even talk about technology problems or solutions, we need to figure out the politics, power dynamics, and legacy structures standing between where we find ourselves today and the ideal future states we envision.
Fragmented Data
While “siloed data” is in itself technically fragmented data, my brain wanted to split the two because fragmentation in a general sense brings its own set of challenges.
Fragmented data emerges not necessarily because of social or political influence or boundaries, but because of other systemic limitations or human behaviors.
Data fragments are collected all the time: Think of when a salesperson engages with a prospect and captures notes, when someone in operations captures transactional data while processing customer orders, when a visitor submits a form from your website, etc.
While it is a design principle of mine to ensure that data is modeled to atomic levels of granularity, which may initially appear to create or enable fragmentation, what I have found is that an intentional disaggregation of data enables future assembly without limiting or legacy constraints.
It is the data that is not approached thoughtfully, that appears with each new process or application introduced to the enterprise, that fragments and becomes impossible to locate and contextually reassemble over time.
Data Locked Away in Apps
Software-as-a-Service (SaaS) made it super easy for individuals and teams within the enterprise to purchase and implement their own cloud-based software applications…many times by simply plunking down a credit card and bypassing not only IT, but corporate purchasing as well.
These applications typically follow a Model-View-Controller-looking architectural style, with a separation between data, logic, and presentation “layers” in the application design.
Because each of these individual applications comes with some sort of data store, and because these app-level data stores aren’t usually integrated with each other, let alone with larger enterprise data stores, the rapid proliferation of cloud-based applications created a sort of “SaaS Sprawl.”
SaaS Sprawl makes the centralized management of these individual, distributed apps difficult enough on its own. Go a step further and take into account the natural fragmentation and siloing that occurs when data isn’t integrated or contextualized across domains and applications.
Now think about holistically managing all of the data and information contained within each of these individual applications, and then integrating with broader enterprise data architectures using common naming standards, coordinating syncing and refreshes across inputs and pipelines, and then on top of all of that being able to traverse data lineage to a source of truth.
Many organizations also have large-scale enterprise applications like Workday, ServiceNow, NetSuite, SAP, or Salesforce. Granted, these applications typically cover entire business processes and provide within a tightly integrated monolithic tech stack, but nonetheless they introduce new complexities and new mental models with their own internal datasets and data architectures.
It all becomes dauntingly complicated when given enough time and space to expand. This is the most perplexing enterprise data problem I continue to encounter, even in 2025.
Competing Definitions / Schemas
What’s the difference between an Account and a Company? Contact and Person? Vendor and Supplier? Deal and Opportunity?
Do you know? Does your enterprise know?
How? Is this planned and documented somewhere? Is there some sort of common data or metadata model that provides explicit structure and meaning to your data?
What happens if two systems use different terms to describe the same thing? Which one wins? Who decides?
Legacy Data Architectures
I have largely avoided the AI + data discussion to this point, but let’s be real for a second - you can’t do any of the AI stuff being (aggressively) sold to you without a solid underpinning data architecture.
Knowing the state of enterprise data architectures…good luck bolting your enterprise AI pilots and proofs-of-concept (POCs) onto that mess.
Even if you are completely mesmerized by the lofty, too-good-to-be-true (as always) promises coming out of Silicon Valley, you have to know in your heart your data is going to need work.
Stop shaking your head. You know I’m right. At some point we all learn that there’s no such thing as skipping ahead.
And that’s just the AI bit. I didn’t even mention all of the amazing architectural possibilities that now exist, assuming enterprises can move beyond legacy thinking and legacy structures to free data up to grow knowledge, capabilities, and relationships across the enterprise (and beyond).
No Source of Truth
Is your data accurate? How do you know? What trusted source do you have available that can tell you, definitively, beyond a shadow of a doubt, that a piece of data or information is what you think it is?
If you work in enterprise tech, you know these are just rhetorical questions…because the enterprise isn’t there yet.
No Lineage
Lineage and History may seem like two sides of the same coin, but there are important distinctions.
When I talk about data lineage, I am talking about the ability to trace data from its current state, back to a known point of origin, taking into account how the data was transformed along the way.
Lineage feels like History, but it’s more about the path that leads us through the History of how the current state of data was arrived at.
And also, history is kind of easy. Lineage is really hard. If you’re not intentionally architecting for lineage, it’s not going to exist on its own.
No History
The mental models that we hold - and the subsequent system designs that manifest from these models - for thinking about the “history” of data, typically comes in the form of logs or records that capture changes in the state of a piece of data.
Value ____ was changed from ____ to ____ by user ____ at ___time.
In many enterprise applications and databases, the true history of data - its values in an observed state at a specific point in time - is flattened into single properties that only indicate who made a change, and at what time the last change was made.
To get data about past states (the “history of history”), you have to explicitly capture the state changes or states at predetermined points in time. This is an intentional design, and as such is not usually prioritized over more “visible” design goals.
In other words, getting reliable history (let alone lineage) from contemporary or legacy data architectures probably ain’t happening.
Data Quality
I mean, do I really need to go into this? Enterprise data typically sucks a big ol’ lemon.
Mutable Data
This is an interesting problem I’ve been running into more and more lately - can your historical data be changed post facto (after the fact)?
If your historical data is mutable (writeable), how can you trust it?
If you’re audited, how do you demonstrate a causal chain to known facts or sources of truth if those sources can be modified over time?
If you produce documents or other artifacts from underlying data, and that underlying data can be changed, how do you reconcile the new discrepancy that emerges between the source data and the (now outdated) static output like a PDF file?
Application Coupling
This problem feels like an amalgamation of other problems I’ve covered, but at the end of the day the concept I was envisioning here was the tight coupling of a data model to its logic and presentation layers.
This is pretty common in larger enterprises that generate their own custom code or configurations on top of purpose-built data models - or abstract concepts such as wrappers - structuring data storage and retrieval processes in a manner that creates new dependencies of their own.
The problem stems from not being able to discern structure or meaning from underlying data without having to view it through a tightly coupled yet abstracted UI, or by reading through code or config files.
IYKYK.
Classical Integration Models Suck
I really struggle with the fact that there are so many possibilities for enabling the end-to-end flow of data, from any number of sources, seamlessly supporting optimized business processes and customer experiences…yet so many enterprises still face “Integration” as if it’s the dirty word it was when I was coming through the ranks in the 1990’s.
I don’t get it. It always feels completely absurd to me that data integration continues to be an overly-complicated mess in so many enterprises. Not that it will ever be easy per se, but maybe we can think beyond 20+ year old integration patterns…and maybe we can stop making this so damn hard on ourselves?
Did I miss anything? What are your biggest pain points with enterprise data and data architecture?