Paranor logo

Data Flow Diagrams

Systems Analysis

Chris Wells, September 27, 2022

Data flow diagrams (DFDs) are a way of representing a system's business processes, the flow of data into and out of those processes, and the flow of data between the system and the external agencies with which it interacts. The resulting graphical views of the system, its processes, and its data flows can be used as a basis for discussion between system developers and users of the system. A hierarchy of DFDs is produced, starting with an overview that provides a very abstract view of the system and ending with a number of diagrams representing the lowest-level sub-processes. The highest level DFD is the context diagram, which simply shows the system of interest, the external entities with which it interacts, and the data flows between the system and the external entities. A typical context diagram is shown below.

A context diagram for an order management system
A context diagram for an order management system

You will notice several things about this diagram. An external entity (in this case "Customer") is shown as an ellipse, appropriately labelled. An external entity is either a source of data entering the system, or a destination for data leaving the system, or (more often than not) both. The system itself is simply shown as a rectangular box with a suitable name. A data flow is represented by a line, suitably labelled, with an arrow at one end (or in some cases both ends) showing in which direction the data is flowing. The Customer external entity is duplicated on the diagram for the sake of clarity, to avoid too many data flows close together. The diagonal line in the top left corner of the Customer external entity symbol is to indicate that more than one instance of this entity appears on the diagram. The last thing of note is the representation of a resource flow ("Goods"), which is shown as a line with an outline arrow-head (or sometimes a double-headed arrow) as opposed to a solid arrowhead. The context diagram serves to define the system boundary. Any entity with which the system interacts, but which is not a part of the system itself, is an external entity.

Before doing too much work on a set of data flow diagrams, it is worth drawing up a list of the external entities providing inputs to or receiving outputs from the system, and identifying those inputs and outputs. In addition, it would be useful to identify all of the high-level business activities included within the system boundary, and relate these activities to specific inputs and outputs. The sort of questions to be asked include questions such as "Who does what, when, where and how?" and "What data do each of these people need to carry out their tasks?".

Physical versus Logical DFDs

Existing and proposed systems can be modelled using physical and logical DFDs. A physical DFD shows how the system is (or will be) constructed, whereas a logical DFD is not concerned with the physical aspects of the system.

Physical DFDs clarify which processes are manual and which are automated, and describe processes in more detail than logical DFDs. They also show the sequence in which processes must be carried out, identify temporary data stores, specify the actual names of files and printouts, and define any controls used to ensure that processes are carried out correctly.

Logical DFDs concentrate on the logical flow of data between business processes rather than the physical implementation of the system, and allow analysts to understand the business more clearly. They attempt to rationalise the lowest-level processes and group them together to form the Level 1 DFD. They also attempt to rationalise the data stores in the system, to relate each data store to one or more entity in the Logical Data Structure (LDS), and ensure that each entity is found in only one data store. The logical DFD provides a solid basis on which to carry out a discussion of the system with users, and results in more stable systems. It also facilitates the elimination of redundancy, and makes it easier to create the final physical model.

Data flow diagrams can be used to represent the system, not only at different levels of detail, but from different perspectives. The four main types of data flow diagram are described below:

  • Current logical DFD - describes what the system does, but not necessarily how it does it. This is useful for discussing the functionality of the system without getting bogged down in too much detail.
  • Current physical DFD - describes what the system does and how the functionality is currently implemented. This type of diagram is useful for highlighting redundant processes and data stores, and for giving the analyst an insight into how the system operates in its present form.
  • Required logical DFD - describes what the new system must be able to do, but not necessarily how it should do it. This is useful for achieving consensus between developers and users on a requirements specification.
  • Required physical DFD - describes what the new system will do and how the functionality will be implemented. This type of diagram is produced during the design stage, and is useful for conveying to users how the system will be implemented.

The required physical and logical DFDs should reflect any changes that are to be made to the existing system, together with any additional features to be incorporated into the new system. These changes and additions should be recorded in a document called the requirements catalogue. The additional features proposed for the new system can be further categorised into those that are mandatory ("must have" features), and those that are seen as desirable but not essential ("would like to have" features).

The DFD Hierarchy

DFDs at different levels are part of a structured hierarchy, with the lowest tier showing the greatest level of detail. Functional decomposition is carried out until an appropriate level of detail has been attained, and a definitive Elementary Process Description (EPD) is defined for each lowest level process. DFDs are used to analyse the system to ensure that the final design is complete, and to provide important system documentation. As already mentioned, the highest level of DFD is the context diagram, which shows the system's relationship to the external entities with which it interacts.

A context diagram for a simple library system
A context diagram for a simple library system

We can represent the high-level processes within the system of interest by creating a level 1 data flow diagram. An example of a level 1 DFD for our order management system is shown below.

A level 1 data flow diagram for an order management system
A level 1 data flow diagram for an order management system

We now have three processes, "Manage enquiry", "Manage order", and "Manage sales ledger". Each process is represented by a rectangle, subdivided into three smaller rectangles. Each has a descriptive name that provides a clue as to the type of activity taking place within it, and an ID number in the top left corner. Note that the ordering of these ID numbers is purely arbitrary, and there is no priority implied by it. The space to the right of the ID number can be used, if required, to identify the person or department responsible for the process, or the location at which it occurs. A process is some activity that receives data, transforms it in some way, and (usually) outputs it again in a modified format.

We also have three data stores, "Sales orders", "Quotations" and "Invoices". The data store symbol is also a rectangle, subdivided into two smaller rectangles and open at one end. The boxed in area at the left side of the data store symbol contains an ID number, prefixed with an upper case "D" (for data, presumably!). Again, no priority is implied by the numbering of data stores. To the right of the ID number is the name of the data store, which usually gives a clue as to the kind of information held. The data store is a generic representation of some physical or electronic data storage medium, such as index cards or a database file. Like external entities, data stores can be duplicated on the same diagram for the sake of clarity.

Let's see another example. A level 1 DFD for our simple library system might look like this:

A Level 1 DFD for a simple library system
A Level 1 DFD for a simple library system

Further analysis

The context diagram and level 1 DFD may well be re-drawn a number of times before a consensus is reached between developers and users that the diagrams accurately represent all of the high level processes, data stores, and data flows. Looking at the above DFD, for example, it will become apparent that we do not currently have a data store for customer information. Once these diagrams are considered to be substantially correct, however, each of the high-level processes included in the level 1 DFD will probably require further analysis to break them down into their constituent sub-processes, resulting in a level 2 DFD being produced for each process shown on the level 1 DFD. A level 2 DFD for the "Manage enquiry" process is shown below.

A level 2 data flow diagram for the "Manage enquiry" process
A level 2 data flow diagram for the "Manage enquiry" process

Each (parent) process in the level 1 DFD will be decomposed into lower level (child) sub-processes. The lower level processes may be further decomposed if necessary, although it is unusual to have to do this beyond level 3, and often level 2 is sufficient. Note that the data flows in and out of a parent DFD must be evident in the child DFD. Note also that the ID number allocated to the parent process is carried down to each child process. In the example above, the "Manage enquiry" process has an ID number of 2, and the three sub-processes have the ID numbers 2.1, 2.2 and 2.3. Once a low level process is considered to be a discrete task that is sufficiently atomic in nature, no further decomposition is necessary and an elementary process description (EPD) can be produced for each low-level process (see example below).

Elementary process descriptions for "Manage enquiry"
Elementary process descriptions for "Manage enquiry"

DFD Guidelines

When drawing data flow diagrams, you should attempt to comply with the following requirements:

  • All processes must have at least one data flow in, and one data flow out
  • Each process should represent only one activity at a particular level
  • Each data store must have both inputs and outputs, and relate to at least one data flow
  • Each external entity must relate to at least one data flow
  • Each data flow must be attached to at least one process
  • A data flow from an external entity must flow into a process
  • A data flow to an external entity must flow from a process
  • A data flow to a data store can only come from a process
  • A data flow from a data store can only go to a process

Common Errors in DFDs include the omission of data flows, showing data flowing in the wrong direction, connecting data stores and external entities directly to each other, and incorrectly labeling processes or data flows. As a general rule, it is recommended not to include more than twelve processes on a DFD. A lower-level (child) DFD should have the same data flows in and out as its parent process.

This article is adapted from material first published on the TechnologyUK.net website in January 2009.