Data flow diagrams (DFDs) are a way of representing a system's business processes, the flow of data into and out of those processes, and the flow of data between the system and the external agencies with which it interacts. The resulting graphical views of the system, its processes, and its data flows can be used as a basis for discussion between system developers and users of the system. A hierarchy of DFDs is produced, starting with an overview that provides a very abstract view of the system and ending with a number of diagrams representing the lowest-level sub-processes. The highest level DFD is the context diagram, which simply shows the system of interest, the external entities with which it interacts, and the data flows between the system and the external entities. A typical context diagram is shown below.
You will notice several things about this diagram. An external entity (in this case "Customer") is shown as an ellipse, appropriately labelled. An external entity is either a source of data entering the system, or a destination for data leaving the system, or (more often than not) both. The system itself is simply shown as a rectangular box with a suitable name. A data flow is represented by a line, suitably labelled, with an arrow at one end (or in some cases both ends) showing in which direction the data is flowing. The Customer external entity is duplicated on the diagram for the sake of clarity, to avoid too many data flows close together. The diagonal line in the top left corner of the Customer external entity symbol is to indicate that more than one instance of this entity appears on the diagram. The last thing of note is the representation of a resource flow ("Goods"), which is shown as a line with an outline arrow-head (or sometimes a double-headed arrow) as opposed to a solid arrowhead. The context diagram serves to define the system boundary. Any entity with which the system interacts, but which is not a part of the system itself, is an external entity.
Before doing too much work on a set of data flow diagrams, it is worth drawing up a list of the external entities providing inputs to or receiving outputs from the system, and identifying those inputs and outputs. In addition, it would be useful to identify all of the high-level business activities included within the system boundary, and relate these activities to specific inputs and outputs. The sort of questions to be asked include questions such as "Who does what, when, where and how?" and "What data do each of these people need to carry out their tasks?".
Existing and proposed systems can be modelled using physical and logical DFDs. A physical DFD shows how the system is (or will be) constructed, whereas a logical DFD is not concerned with the physical aspects of the system.
Physical DFDs clarify which processes are manual and which are automated, and describe processes in more detail than logical DFDs. They also show the sequence in which processes must be carried out, identify temporary data stores, specify the actual names of files and printouts, and define any controls used to ensure that processes are carried out correctly.
Logical DFDs concentrate on the logical flow of data between business processes rather than the physical implementation of the system, and allow analysts to understand the business more clearly. They attempt to rationalise the lowest-level processes and group them together to form the Level 1 DFD. They also attempt to rationalise the data stores in the system, to relate each data store to one or more entity in the Logical Data Structure (LDS), and ensure that each entity is found in only one data store. The logical DFD provides a solid basis on which to carry out a discussion of the system with users, and results in more stable systems. It also facilitates the elimination of redundancy, and makes it easier to create the final physical model.
Data flow diagrams can be used to represent the system, not only at different levels of detail, but from different perspectives. The four main types of data flow diagram are described below:
The required physical and logical DFDs should reflect any changes that are to be made to the existing system, together with any additional features to be incorporated into the new system. These changes and additions should be recorded in a document called the requirements catalogue. The additional features proposed for the new system can be further categorised into those that are mandatory ("must have" features), and those that are seen as desirable but not essential ("would like to have" features).
DFDs at different levels are part of a structured hierarchy, with the lowest tier showing the greatest level of detail. Functional decomposition is carried out until an appropriate level of detail has been attained, and a definitive Elementary Process Description (EPD) is defined for each lowest level process. DFDs are used to analyse the system to ensure that the final design is complete, and to provide important system documentation. As already mentioned, the highest level of DFD is the context diagram, which shows the system's relationship to the external entities with which it interacts.
We can represent the high-level processes within the system of interest by creating a level 1 data flow diagram. An example of a level 1 DFD for our order management system is shown below.
We now have three processes, "Manage enquiry", "Manage order", and "Manage sales ledger". Each process is represented by a rectangle, subdivided into three smaller rectangles. Each has a descriptive name that provides a clue as to the type of activity taking place within it, and an ID number in the top left corner. Note that the ordering of these ID numbers is purely arbitrary, and there is no priority implied by it. The space to the right of the ID number can be used, if required, to identify the person or department responsible for the process, or the location at which it occurs. A process is some activity that receives data, transforms it in some way, and (usually) outputs it again in a modified format.
We also have three data stores, "Sales orders", "Quotations" and "Invoices". The data store symbol is also a rectangle, subdivided into two smaller rectangles and open at one end. The boxed in area at the left side of the data store symbol contains an ID number, prefixed with an upper case "D" (for data, presumably!). Again, no priority is implied by the numbering of data stores. To the right of the ID number is the name of the data store, which usually gives a clue as to the kind of information held. The data store is a generic representation of some physical or electronic data storage medium, such as index cards or a database file. Like external entities, data stores can be duplicated on the same diagram for the sake of clarity.
Let's see another example. A level 1 DFD for our simple library system might look like this:
The context diagram and level 1 DFD may well be re-drawn a number of times before a consensus is reached between developers and users that the diagrams accurately represent all of the high level processes, data stores, and data flows. Looking at the above DFD, for example, it will become apparent that we do not currently have a data store for customer information. Once these diagrams are considered to be substantially correct, however, each of the high-level processes included in the level 1 DFD will probably require further analysis to break them down into their constituent sub-processes, resulting in a level 2 DFD being produced for each process shown on the level 1 DFD. A level 2 DFD for the "Manage enquiry" process is shown below.
Each (parent) process in the level 1 DFD will be decomposed into lower level (child) sub-processes. The lower level processes may be further decomposed if necessary, although it is unusual to have to do this beyond level 3, and often level 2 is sufficient. Note that the data flows in and out of a parent DFD must be evident in the child DFD. Note also that the ID number allocated to the parent process is carried down to each child process. In the example above, the "Manage enquiry" process has an ID number of 2, and the three sub-processes have the ID numbers 2.1, 2.2 and 2.3. Once a low level process is considered to be a discrete task that is sufficiently atomic in nature, no further decomposition is necessary and an elementary process description (EPD) can be produced for each low-level process (see example below).
When drawing data flow diagrams, you should attempt to comply with the following requirements:
Common Errors in DFDs include the omission of data flows, showing data flowing in the wrong direction, connecting data stores and external entities directly to each other, and incorrectly labeling processes or data flows. As a general rule, it is recommended not to include more than twelve processes on a DFD. A lower-level (child) DFD should have the same data flows in and out as its parent process.
This article is adapted from material first published on the TechnologyUK.net website in January 2009.