Don’t start the pipeline until you understand the business
Do you really know what that pipeline is for?
Why does this even matter?
A lot of technical problems in data engineering come from not truly understanding why the data is needed in the first place. You build a pipeline, create a table, and then it turns out that the analyst, the PM, and you all have a different idea of what an "active customer" is. The result? Three different reports, three different numbers, and zero trust in the data.
The theoretical foundation: What is business modeling?
Business modeling is the process of identifying, defining, and structuring the essential components of a business - its processes, entities, relationships, and rules. In the context of data engineering, business modeling helps translate business needs and logic into data structures and definitions that can be implemented in databases, data warehouses, or data lakes.
A well-designed data model is more than just a technical artifact; it serves as a blueprint for how data is stored, organized, and accessed, ensuring that it supports business operations and analytics effectively. By understanding business modeling, data engineers can ensure that the data architecture reflects real-world business processes and requirements, rather than just technical convenience.
The role of data modeling in Data Engineering
Data modeling is the backbone of any data engineering project. It involves designing conceptual, logical, and physical models that define how data is structured and related.
Conceptual models provide a high-level view of the data based on business requirements, focusing on entities and their relationships.
Logical models add detail, specifying attributes, keys, and rules, but remain technology-agnostic.
Physical models deal with the actual implementation, including storage formats, indexing, and optimization strategies.
Mastering these models allows data engineers to design systems that are not only functional but also scalable and adaptable to future business needs. Without a strong foundation in data modeling, data engineers risk creating systems that are inefficient, inflexible, or prone to errors.
Example: different definitions of "Active Customer"
Imagine you need to build a table of "active customers."
Analyst: An active customer is someone who made at least one transaction in the last 30 days.
PM: An active customer is someone who logged into the app in the last 7 days.
Marketing: An active customer is someone who opened a promotional email this month.
How do you deal with this?
Business modeling means agreeing on what "active customer" really means – and encoding that into your data model. Without this, every team builds their own tables, calculations, and reports - and you end up wasting time explaining why the numbers don’t match.
Communicating with analysts and PMs
A data engineer’s role isn’t just to “push data” through pipelines. It’s essential to engage with analysts and product managers to understand the purpose behind the data. Far from being a waste of time, these conversations save you from having to redesign tables and pipelines every month.
For example, during a meeting with the PM about a report on active customers, you could ask:
How exactly do we define “active customer”?
Which metrics are the most important?
How often should the data be refreshed?
Are there any edge cases, such as trial users or VIP customers?
Who else uses this metric? Is it shared with Marketing or Finance?
By clarifying these details upfront, you can design a data model that fits the business needs precisely - and avoid costly surprises down the road.
Designing data models for business metrics and KPIs
Effective business modeling lets you build data structures that mirror real-world business needs, especially the key metrics that drive strategic decisions.
Translating business definitions into data
Consider the example of an “active customer.” If your product manager defines an active customer as someone who logged into the app in the past 7 days, you can bake this definition directly into your data model-through a specific flag or a dedicated view. This ensures that every report, dashboard, or query consistently reflects the same meaning, making the data clear and reliable for all users, regardless of their technical background.
Building for adaptability
When new metrics emerge, you won’t need to overhaul your entire pipeline. Instead, your well-structured model lets you simply add new views or fields, making the process quicker and less error-prone.
Why this matters
By aligning your data model closely with business logic, you gain:
fewer questions and clarifications from stakeholders,
more accurate, consistent reports,
faster development and deployment of new metrics,
and ultimately, greater trust in your data.
The importance of collaboration and shared understanding
Data modeling is not just a technical exercise; it’s a collaborative process that requires input from business stakeholders, analysts, and engineers. By involving all relevant parties, you ensure that the data model accurately reflects business needs and that everyone has a shared understanding of key concepts and definitions.
This collaborative approach helps prevent misunderstandings, reduces the risk of errors, and ensures that the data architecture can evolve as business requirements change. It also makes it easier to maintain data quality, consistency, and reliability over time.
Summary
Understanding business modeling isn’t a luxury - it’s a must if you want to avoid chaos in your data and constant redesigns.
Key takeaways:
Ask questions about definitions and goals
Align your data model with the business
Design for specific metrics and KPIs
This way, you save time and build trust in your solutions.

