Data-Flow Diagrams 101: Where Your Data Actually Goes

What Is a Data-Flow Diagram?

A data-flow diagram (or DFD) is a picture of a system drawn from the point of view of the data moving through it. Where a flowchart answers "what steps happen, and in what order?", a DFD answers a different question: "where does the data come from, where does it go, and who touches it along the way?" The diagram leaves out time and control flow on purpose. There are no decisions, no loops, no start and end — only processes that transform data, stores that hold it, external actors that produce or consume it, and the arrows that carry it between them.

DFDs were formalised in the late 1970s as part of structured systems analysis (Tom DeMarco's book and Chris Gane and Trish Sarson's book are the two classics). They were originally drawn on paper by analysts who needed to explain back-office systems to executives who didn't read code. Half a century later they remain one of the most honest ways to model a system: anything the diagram doesn't show, the system doesn't do with data.

A minimal DFD: an external Customer sends order details to a Place order process, which writes a new order to the Orders data store. Four elements, three kinds of shape, done.

This article teaches you to read and draw DFDs from scratch. We'll cover the four symbols they're made of, two notation styles (so you recognise both), how to draw a context diagram and then decompose it into levels, the rules that separate a real DFD from a tangled blob, and a worked example — an online bookstore — drawn level by level.

Why DFDs Are Worth Knowing

DFDs solve a specific problem that flowcharts can't: they let you reason about a system without yet deciding when anything runs. The same process boxes will eventually be implemented as services, queues, cron jobs, or async handlers — but before that decision, you just need to know which pieces of data pass through which transformations. A DFD lets you sketch the data backbone without committing to an execution model.

Three situations where DFDs are the right tool:

Documenting a system for auditors, compliance, or privacy review. "Where does personal data live, and who writes to it?" is exactly the question a DFD answers. GDPR, HIPAA, and SOC-2 reviews all tend to end with someone drawing one.
Designing the data architecture before the code. Before anyone argues about microservices vs. monolith, the DFD settles what the system has to do with data. The execution shape follows.
Onboarding someone to an unfamiliar system. A good DFD walks a new engineer through the system in a single page — no code reading required — because it names every data store and every process without any implementation noise.

DFDs are also small. A complete system often fits on one sheet of paper at level 0, and the levels below it expand only the pieces that are interesting.

The Four Symbols

Every DFD is built from four kinds of element, each with a specific job. Learn these four and you can read any DFD, regardless of which notation style it was drawn in.

The four DFD symbols. External entities sit outside the system, processes transform data, data stores persist it, and data flows connect everything. No start, no end, no decisions.

Worth noting about each:

External entity — a source or sink of data that lives outside the boundary of the system you are modelling. Customers, suppliers, Stripe, the mail server, and last-night's cron job are all external entities. Draw them with a rectangle (Yourdon/DeMarco) or a square with a shadow (Gane-Sarson). External entities cannot talk to each other directly on a DFD — all communication has to flow through a process.
Process — a unit of work that takes in one or more data flows and emits one or more data flows. A process always transforms the data: if input equals output, it's not a process, it's a passthrough and doesn't belong on the diagram. Label a process with a verb phrase ("Validate payment", "Compute shipping") and a number like 1.0 that will let you decompose it later.
Data store — somewhere data is persisted between uses. A database table, a file, an in-memory cache that outlives a request. Give each store an D1, D2, … id and a plural noun name (Orders, Users, Invoices). Data stores are passive — they never transform data on their own; a process has to read from or write to them.
Data flow — a named packet of data in motion. An arrow with a label like new order or invoice pdf. The label is the content, not the channel — "email" or "HTTP POST" are implementation details that don't belong on a DFD. Every arrow must be labelled; an unlabelled arrow is the DFD equivalent of a bug.

Two Notation Styles

There are two competing notations in wide use. You'll see both; the shapes have slightly different outlines, but the semantics are identical.

Yourdon / DeMarco. Processes are circles. Data stores are parallel lines, open on both ends. External entities are rectangles. This is the style you'll see in academic textbooks and most software-engineering courses — it's the style every illustration in this article uses.
Gane-Sarson. Processes are rounded rectangles. Data stores are open rectangles (a rectangle with one side missing). External entities are squares with a drop shadow. Gane-Sarson is more common in business-analyst and enterprise-architecture circles.

Pick a style, use it consistently, and be ready to read the other one. Tools like CorriDraw, Visio, and Lucidchart ship with both shape sets.

Levels: the Context Diagram and Its Children

DFDs are hierarchical. You start at the outermost level — the whole system as a single process — and then zoom in by replacing that process with its internal sub-processes on a lower-level diagram. The levels have names.

Level 0 — the context diagram. The entire system appears as one circle. Everything outside the system appears as external entities. The only arrows are those that cross the system boundary. A context diagram almost always fits on half a page and is the single most useful DFD for explaining what the system is.
Level 1. That one circle is replaced by several — typically three to seven — each representing a major sub-process. The same external entities reappear around the edges, and new data stores usually show up for the first time. This is where the interesting data paths become visible.
Level 2, 3, … Each process on the previous level can be expanded into another diagram. Keep going only as deep as you need. A level-3 diagram of a moderately complex sub-process is usually enough; if you find yourself drawing a level-5 DFD, you've probably crossed into implementation detail and should stop.

The context diagram for an online bookstore. One circle for the whole system, four external entities, and only arrows that cross the boundary. A new hire can understand the business in 30 seconds.

The context diagram is deceptively simple but it forces you to answer two questions honestly: what is inside the system (everything represented by that single circle) and what is outside (everything else). The boundary is the whole point. If you can't agree on what's inside your own system, nobody downstream will be able to either.

A Worked Example: the Bookstore at Level 1

Now we zoom in. The single Bookstore circle becomes a handful of named processes, and the data stores that hold the business's records finally appear.

Level 1 of the bookstore. Four processes, three data stores, and the same four external entities from the context diagram. Every arrow carries a named data flow — no "HTTP", no "queue", no implementation noise.

A few things worth pointing out in this diagram:

The external entities are the same as in the context diagram. When you decompose level 0 into level 1, the set of external entities does not change — only the inside of the system is expanded. This property is called balancing, and it's how reviewers verify you didn't lose or invent a data flow during decomposition.
Every process has at least one input and one output. A process with only inputs is a "black hole" (data disappears into it); a process with only outputs is a "miracle" (data appears from nowhere). Both are mistakes. Every process is a transformation, so it must have both sides.
Data stores connect only to processes. D1 Orders is never connected directly to the Customer entity, because an external actor can't reach into your database. They talk to a process (Validate), which talks to the store.
Arrows have short, specific labels. new order, charge result, stock. Not "data" or "info" — those labels say nothing. A good label tells the reader what's moving without needing the context of the rest of the diagram.

The Rules for a Clean DFD

Five rules cover almost every DFD you'll ever need to draw or review. Print them on a sticky note; they will save hours of review time.

No entity-to-entity flows. Two external entities never have a direct arrow between them on a DFD — the system doesn't handle that data, so it doesn't belong. If a customer talks to a supplier without your system being involved, that's not your diagram's business.
No store-to-store flows. Data doesn't move between stores on its own. A process has to read from one store and write to the other.
No entity-to-store flows. Same reason — an external actor can't read or write your stores directly. A process always mediates.
Every flow is labelled. Every single arrow gets a noun or noun phrase that describes its content. An arrow with no label means "I don't know what this is", which means you haven't finished the diagram.
Balance the levels. When you expand process 2.0 into its own level-2 diagram, the data flows crossing the edge of that diagram must exactly match the flows into and out of 2.0 on its parent. Otherwise you've invented data or lost data somewhere between levels.

DFD vs. Flowchart: Two Very Different Diagrams

The single most common misconception is that a DFD is "a flowchart about data". It isn't. A flowchart and a DFD answer different questions about the same system.

Same fragment of a system, two diagram types. The flowchart asks what happens, in what order?. The DFD asks what data moves, and where is it stored?. Neither replaces the other.

Rules of thumb for choosing between them:

If there's a decision, a loop, or a time-ordered sequence you need to show, draw a flowchart.
If there's a data store, a record being transformed, or an external system you need to name, draw a DFD.
For a real system, you often want both — the DFD for data architecture, the flowchart for a specific user journey or an error-handling path.

Common Beginner Mistakes

Six mistakes account for almost every unreadable DFD. Watch for them when you review your own drafts.

Drawing control flow. If you find yourself wanting to add a decision diamond or a timing arrow, you're drawing a flowchart, not a DFD. Put the control-flow logic on a separate diagram.
Unlabelled arrows. An arrow with no label is a half-finished thought. Every arrow must carry a noun phrase that names what's moving.
"Data" as a label. The whole diagram is about data. An arrow labelled data tells the reader literally nothing. Say order record, invoice, auth token instead.
Store-to-store, entity-to-entity, entity-to-store flows. All three are forbidden. If you need data to move from store A to store B, draw the process that does the moving.
"Black hole" or "miracle" processes. A process must have at least one input and at least one output — it's a transformation, not a sink or a source. If your Compute total circle has no inputs, you've forgotten to draw where the line items come from.
Level imbalance. The arrows entering and leaving process 2.0 on level 1 must match exactly the arrows that cross the boundary of the level-2 diagram that expands 2.0. When they don't, reviewers will find it — and will wonder what else is missing.

Where to Go Next

DFDs fit into a small ecosystem of structured-analysis diagrams. Three natural next steps once you're comfortable with them:

Entity-relationship diagrams. DFDs name the data stores; ER diagrams describe the shape of the records inside those stores. The two diagrams are complementary — a DFD tells you there's a D1 Orders store; an ER diagram tells you an Order has an id, a status, a customer_id, and a list of line items.
Sequence diagrams. When you need to show how a particular request travels between the processes in your DFD — the order of calls, the synchronous/asynchronous distinction — a UML sequence diagram is the matching tool.
C4 architecture diagrams. A modern descendant of DFDs that adds a clearer notion of technology boundaries (container, component, deployment). If your audience asks "what runs where?", C4 is probably a better fit than a DFD.

A Note on Tools

Every idea in this article is about the notation — the four shapes, the levelling rules, the balancing property — not about any particular tool. A DFD drawn on a whiteboard reads the same as a DFD drawn in Visio, Lucidchart, or a text-based tool like Mermaid. If you want a canvas with both Yourdon/DeMarco and Gane-Sarson shape sets, arrow-snapping that keeps flow labels tidy, and a way to collaborate in real time while you decompose the diagram, CorriDraw — the tool this blog lives on — is one option. The ideas transfer regardless. Pick whatever draws cleanly; the notation is the same everywhere.