Paranor logo

An Introduction to OOAD

Systems Analysis

Chris Wells, September 27, 2022

Object-Oriented Analysis and Design (OOAD) is a software development approach that models a system as a group of objects. Each object represents some real world entity within the system being modeled, and has its own attributes and operations. A range of models can be created using such objects to reflect the structure and behaviour of the system. One of the most widely used notations for depicting objects and the way they interact with each other is the Unified Modeling Language (UML). UML was adopted by the Object Management Group (OMG) in 1997 as the standard for object-oriented modelling, and combines widely accepted concepts from a number of object-oriented modelling techniques.

The line between analysis and design in OOAD is indistinct, but essentially object modelling is used to determine the functional requirements of a system (i.e. what the system must do) during the analysis phase, and the models thus derived are refined and augmented to produce a blueprint for the physical implementation (i.e. how the system will do what it must do) during the design phase.

The information fed into the analysis stage is derived by all the usual means, such as written requirements statements, interviews, observation, and a study of the system's documentation. The various diagrams produced during the analysis and design stages are easy to understand, and can be used to explain aspects of the proposed system design to users, and obtain feedback. The final set of design-phase diagrams reflect the chosen physical architecture, and take into consideration any constraints associated with the environment in which the system must operate, or the technical options available. They will be developed in sufficient detail to provide the necessary input to the implementation phase.

Objects and classes

Any discussion of object-oriented analysis and design rests on a good understanding of what objects and classes are, and the relationship between them. Essentially, an object is an abstract representation of some real world entity, such as a person or a bank account. It will hold data items (attributes) about the entity (for example name or account number). It will define operations (or methods) to report the values held by attributes, or modify them in some way. The operations defined for a class provide it with an interface through wich it can communicate with other objects. The object will be identified by a unique name. A class is a template that defines the attributes and operations that are common to the objects created from it. Looking at this from a different perspective, an object is an instance (a specific example) of a class. These ideas are common to both the object modelling notations used to analyse and design systems, and the object-oriented programming languages (such as C++ and Java) used to implement those systems.

A simple bank account class
A simple bank account class

The simple bank account class shown above is represented by a rectangle divided into three sections. The class name, "BankAccount", appears in the top section of the rectangle. The middle section contains the attributes for the class, "AccountNo" and "Balance", and the bottom section contains two operations, "ReturnBalance" and "UpdateBalance". This is the general format in which the Unified Modelling Language represents a class. As the analysis process continues, additional attributes and operations may emerge and be added to the model.

Specialised classes can be derived from more generic classes. A "student" class, for example, would have the same general attributes (e.g. "name") as a "person" class, but would have additional attributes such as "student_number", and additional operations to manipulate those attributes. The idea of one class (a child class or subclass) deriving attributes and operations from another class (a parent class or superclass) is called inheritance. Inheritance is seen as a way of encouraging the re-use of code during the implementation phase, since existing attributes and operations are passed down from parent to child without needing to be explicitly redefined for the new (derived) class.

A class diagram illustrating inheritance
A class diagram illustrating inheritance

In the class diagram shown above, the triangle symbol indicates inheritance, with the apex of the triangle pointing towards the class from which operations and attributes are inherited. The "CurrentAcccount" and "SavingsAccount" classes inherit the operations and attributes of the "BankAccount" class, so these operations and attributes, although not shown, are implicit within these classes. Note that we have added the operations "CalculateCharges" and "CalculateInterest" to both sub-classes. These operations could have been included in the "BankAccount" class and inherited, but there is reason to suppose that the implementation of the operations will be different for different types of account. In fact, one of the benefits of the modelling approach is that we can explore the various ways in which functionality can be implemented before we write any code. If we decide later that the "CalculateInterest" operation, for example, will be identical for all types of account, we can change the model to reflect this.

Modelling the system structure

Modelling languages like UML provide a way to create a model of a system that can be understood by both system developers and users, and can form the basis for an ongoing dialogue between these two groups of people. Everything within the system of interest is described in terms of objects, the properties of those objects, how objects can act and be acted upon, and the relationships that exist between objects. The models themselves consist primarily of diagrams, and the various types of diagram used employ a consistent notation throughout to maintain clarity. The aspects of a system that can be represented diagrammatically include its structure, which for the most part is relatively static, and its behaviour, which is dynamic and often complex.

The analysis phase looks at the existing system using a fairly high level of abstraction to determine the requirements of the system to be developed, without considering how those requirements will be implemented. It usually starts with a process of identifying objects within the system of interest, their attributes, and the relationships between them. One way of achieving this is to examine existing system documentation, such as invoices or purchase orders, and pick out the nouns used (e.g. "customer", "supplier", "account"). Suitable candidates will be chosen from the list to be modelled as classes using a class diagram of the type we have already seen. Here is a slightly more developed model of our simple bank system:

The banking system model
The banking system model

We have now added four more classes to the model, and have created a number of new relationships. These new classes and relationships are described below:

  • A Branch is administered by one (and only one) HeadfOffice. A HeadOffice, in contrast, may administer many instances of Branch. This one-to-many relationship is denoted by the number one (representing the one end of the relationship) appearing at one end of the line that connects the two entities, while an asterisk (representing the many end of the relationship) appears at the other end. Other types of relationship are possible, such as one-to-one, and many-to-many, but we will not consider them here.
  • A Branch may hold many instances of Account, but each Account can be held by one (and only one) Branch.
  • Each Account belongs to one (and only one) Customer, but a Customer may have many instances of Account.
  • The belongsTo relationship between Account and Customer is inherited by the CurrentAccount and SavingsAccount classes.
  • Two relationships exist between Account and Transaction - debit and credit. Each instance of Transaction relates to either one or two instances of Acccount. Cash withdrawals or deposits involve only one account per transaction, but other transactions (for example, direct debits or standing orders) will involve two accounts, one from which money is debited, and the other to which money is credited. The bank system should be able to keep track of the type of transaction being carried out, and which account(s) are involved in each transaction (note, however, that the attributes needed to store this data do not yet exist).

Class diagrams depict the structure of a system. They identify the classes used by the system, and define the attributes of each class, the operations they can perform, and the dependencies and relationships that exist between them. The class diagrams derived for the system will continually evolve throughout the analysis and design process as the requirements of the system become more clearly understood. The task of translating class diagrams into code is greatly facilitated by the use of an object-oriented programming language such as C++ or Java. Solid lines depict an association, whereas dashed lines depict a dependency.

Other types of UML diagram used to represent the structure of the system include:

  • Component diagrams - a component is a collection of closely related classes that provide a specific set of services. A component diagram shows the structure of (usually) a single component within the overall system architecture. Each component diagram depicts the detailed class structure of a component, including the class attributes and operations, and the dependencies and relationships between classes.
  • Package diagrams - a package diagram shows the hierarchical structure of the system design. It illustrates how related system elements are grouped, and the dependencies and relationships between the various groupings. Package diagrams can be used to break a large and complex design down into a number of smaller and more easily managed designs.
  • Deployment diagrams - the deployment diagram shows the physical architecture of the system in terms of its implementation in hardware, how the system components are assigned to the various nodes, and how the nodes communicate with each other and with other hardware devices.

Modelling the system's behaviour

The behaviour of the system is modelled dynamically. You can think of it in terms of creating a series of stories, each of which describes some aspect of how objects behave and how they interact with one another. A number of diagramming techniques are provided by UML to model the events that occur within the system, what actors are responsible for instigating them, and the actions that they trigger. We are interested in the interactions between objects, the transitions that occur, and the sequencing of events. Modelling these dynamic characteristics can often uncover additional system requirements, and result in further refinement of the system's logical or physical architecture models.

Use Case diagrams

Use-case diagrams model the system's functions from the perspective of various actors. An actor is an external entity that makes use of the system in one or more ways. The actor is usually a person, but may be an organisation or another computer system. Actors may also (though not necessarily) be represented within the system by a class. The use-case models how an actor uses the system in one particular way, and the functionality provided by the system for that situation. A use-case may involve more than one actor, and an actor may be involved in a number of use-cases (though often not at the same time). Actors are represented diagrammatically by stick men, while the use-cases they relate to are represented by an ellipse, as shown below.

A simple use-case diagram
A simple use-case diagram

A use-case is an abstract view of the services provided by the system to the actors that use it. It does not specify how those services are provided. Use-cases are used in the analysis phase to determine the system requirements. As the requirements become clearer, the object model is updated in parallel, to ensure that all the necessary classes, attributes, operations, dependencies and relationships are in place. Essentially, each use-case in a system represents one way in which one actor interacts with the system. The more actors there are, and the more ways in which each actor can interact with the system, the more use cases will need to be considered.

The use-case diagram below illustrates some of the business functions that might be required by our bank system. It shows, for example, that customers may use the system to make deposits or withdrawals, check their balance, or take out a loan. Cashiers may be involved with deposits or withdrawals, and in the case of the latter, will need to check the account balance to ensure sufficient funds are available. In both cases, they will need to update the account balance. Managers may be involved in arranging loans for customers.

A use-case diagram for a simple bank system
A use-case diagram for a simple bank system

Use-cases can be used to break a system down into atomic business functions that can be used both as units for costing, or for planning a phased system delivery. Each phase of delivery would deliver a number of use-cases. A scenario is a specific sequence of actions that represents a single instance of a use-case. For example, in a real banking system a customer may withdraw cash from an ATM, over the counter in the bank, or as cash-back from a shop or supermarket. For that matter, many banking transactions such as paying bills, transferring of funds, or setting up direct debits and standing orders can now be carried out using online banking services, so for a real banking system, there would be a great many scenarios we might need to consider.

A use-case diagram is often accompanied by a brief textual description for each use case shown. For example, the withdraw cash use-case might be described as:

A customer uses the system to withdraw a specified sum of money from a specified account.

The use-case is often also accompanied by a set of formally documented requirements, scenarios, and constraints. Requirements are specified in terms of the functionality that the use-case will provide, the inputs to the use-case, and the outputs from the use-case. Scenarios are formal descriptions of the flow of events during execution of an instance of a use-case, and are directly related to sequence diagrams (see below). Constraints are restrictions imposed on the occurrence of a use-case, and specify what pre-conditions must be true before the use-case can proceed, what post-conditions must be true following the execution of the use case, and what invariant conditions must be true during the execution of the use case.

Other types of UML diagram used to represent the behaviour of the system include:

  • Activity diagrams - an activity diagram elaborates on the behaviour of the system by modelling the detailed behaviour of an atomic element of functionality (represented in the first instance by a use-case). The diagram may include a primary scenario and a number of alternate scenarios, and is useful for deriving a complete understanding of the functionality of a given use-case.
  • Sequence diagrams - a sequence diagram shows how objects interact over time. Each sequence diagram relates to a single use-case, and is useful for understanding the flow of messages between objects.
  • Statechart diagrams - a statechart diagram shows how the state of a system changes in response to events, and is useful for making sure that all events are handled properly, whatever state the system is in. External events (for example, a mouse button click) trigger a response in a system object. The object may generate further (internal) events as part of that response, which triggers a response from another object. A statechart diagram models a set of states that an object can be in, and the events which take it from one state to another. An event usually corresponds to an operation in the object model, while a state corresponds to the values held by the object's attributes. Statechart diagrams allow you to determine the operations and attributes that need to be defined for an object by exploring the events that can change an object's state, and the attributes affected by those events.

Activity diagrams

An activity diagram is a bit like a flowchart for a computer program, but in this case it represents activities within a system, and the events that cause objects to be in a particular state. The following simple activity diagram illustrates what might be involved in a student attending college for a class at 9 a.m.

A simple activity diagram
A simple activity diagram

The first activity is to get dressed. A decision then has to be made, depending on how much time is available before the class begins and the times of local buses. If there is enough time to get to college by bus, take the bus. If not, take a taxi. The last activity is to actually attend the class, after which the activity diagram terminates. The diagram commences at the initial node (drawn as a solid black circle), and ends at the terminating node (drawn as a bull's eye). Activities are drawn as rounded rectangles containing a brief description of the activity. A decision node is shown as a diamond, with arrows emerging from it to represent the alternative control flows. Each alternative is labelled to show the options that determine which flow is followed.

Now we will look at another example that deals with the activity of managing course information in a courseware management system (such as Moodle, for example). In order to manage course information, the course administrator carries out the following activities:

  • Check whether the course exists
  • If this is a new course, go to the "Create course" step
  • If the course already exists, check what action is to be taken (either modify or remove the course)
  • If the course is to be modified, the "Modify course" activity is performed
  • If the course is to be removed, the "Remove course" activity is performed
A more complex activity diagram
A more complex activity diagram

Note the fork node, represented by the heavy black line. Fork nodes are used to either split a workflow into multiple concurrent flows, or (as in this case) combine multiple concurrent flows. The "Course" object is included in the activity diagram, since its state is affected by each of the activities shown.

This article was first published on the website in January 2009.

Entity Life Histories

Systems Analysis

Chris Wells, September 27, 2022

The purpose of an information system is to provide up-to-date and accurate information. The information held on the system is constantly changing - the number and names of the patients on a hospital ward, for example, or the price of electronic components. The system must be able to keep track of such changes. An entity life history (ELH) is a diagrammatic method of recording how information may change over time, and models the complete catalogue of events that can affect a data entity from its creation to its deletion, the context in which each event might occur, and the order in which events may occur. The ELH represents every possible sequence of events that can occur during the life of the entity. Remember that, although data is changed by a system process, the occurrence of that process is triggered by some event.

It would obviously be an overwhelming task to model the all of the events that could affect the system data at the same time, so instead we examine just one entity within the logical data structure at a time. An entity life history will be produced for each entity in the logical data structure. Information from the individual life histories can be collated at a later time to produce an entity/event matrix.

The diagram below shows how we might model the life history of a bank account entity.

An entity life history for the "Bank Account" entity
An entity life history for the "Bank Account" entity

The entity life history for "Bank Account" should accommodate any possible occurrence of that entity. All bank accounts must be opened, and money is either paid in or withdrawn. The diagram itself is read from left to right. If the structure branches downward, the branch must be followed down before moving on towards the right-hand side of the diagram. The first event to affect any occurrence of "Bank Account" will be the opening of the account. The account will have a life, which will consist of a series of transactions. Transactions can include the deposit or withdrawal of funds, direct payments, or the cashing of cheques. After an unspecified number of transactions have occurred, the account will be closed, and eventually deleted. The entity life history elements featured in the above example are:

  • Sequence - activities are undertaken in strict sequence, from left to right (for example, an account must be opened before any other event that will affect it can occur, and account closure must occur before account deletion).
  • Iteration - the asterisk in the top right-hand corner of the Transaction box signifies that a transaction is an event that can occur repeatedly.
  • Selection - boxes with small circles in their top right-hand corner represent alternative forms of transaction. A single transaction may be a deposit or a withdrawal of funds, a direct payment, of the cashing of a cheque.

" In the same way that an entity may be affected by several different events, a single event may affect more than one entity. When an instance of "Bank Account" is created, for example, an instance of "Customer" must also be created. The interaction between an event and an entity is called an effect. Notice that elements that have an effect on an entity have no other elements below them in the entity life history diagram. Elements that do have other elements below them are called nodes. They have no significance other specifying the sequence in which events may occur within the context of the entity's life history. The name of each element (shown as a label inside the box representing the entity) reflects the event affecting the entity (if the element is an effect) or a particular stage within the life history (if the element is a node).

Although an entity life history can be constructed using only the elements sequence, iteration and selection, the representation of certain complex scenarios can be greatly simplified using two additional element types:

  • parallel structures
  • quit and resume


A sequence consists of a series of nodes and/or effects reading from left to right, as shown below.

Boxes A, B, C, and D represent a sequence
Boxes A, B, C, and D represent a sequence

In the above example, effect A will always occur first, followed by B, then C, then D. This is the only possible sequence. Although these sequential events will take place over a period of time, the time intervals involved are unspecified, and could range from a few seconds to many years.


A selection defines a number of nodes or effects that are alternatives to one another at a particular point in the entity's life history. A circle in the top right hand corner of the box representing the element indicates that it is one of several elements that could be chosen.

Boxes E, F and G represent the available options
Boxes E, F and G represent the available options

Because node A is the first element in the entity life history of "Entity X", an occurrence of the entity can only be created by event E, F or G. If we want to represent the fact that none of the available options have to be selected, we can include a null box, as shown below.

A null box indicates that an option does not need to be selected
A null box indicates that an option does not need to be selected

If an event or node can occur repeatedly at the same point within an entity's life history, the fact is signified by an asterisk in the top left-hand corner of the box representing the event or node. The only restriction on iterations is that a single occurrence of the iteration must be complete before the next one starts.

Event H may occur repeatedly
Event H may occur repeatedly

Once "Entity X" has been created by event E, F or G under node A, event H can affect the entity zero or more times The iteration symbol must not be used for events or nodes that occur only once, or not at all (use the null box instead).

Parallel structures

A parallel structure can be used if the sequence in which two nodes or events can occur is unpredictable, or where they may occur concurrently. Such a structure is shown as two nodes or events connected by parallel horizontal lines, as illustrated below.

Nodes I and J form a parallel structure
Nodes I and J form a parallel structure

In the entity life history above, the events K, L and M may occur, in that order, under node I. At the same time event N, under node J, may occur zero or more times. Nodes I and J (representing the sequence and the iteration respectively) are connected by a parallel bar to signify possible concurrency. The same situation could be modelled using only sequence, iteration and selection elements, but the resulting diagram would be far more complex and consequently more difficult to interpret.

Quit and resume

Occasionally, a situation can arise that cannot easily be modelled using the entity life history elements already described. In order to accommodate such situations without making the entity life history diagram unduly large or complex, the quit and resume facility allows the sequential progress of nodes or events to quit at one point in the entity life history and resume at another point. This concept is illustrated below.

Following event F, activity will continue at node C
Following event F, activity will continue at node C

In the above example, event F has the label "Q1" immediately to its right, and node C has the label "R1" immediately to its right. Using this notation, we can signify that the event or node that will follow event F is whichever element has the label R1 to its right, which in this case is node C. As with parallel structures, the same situation could be modelled using only sequence, iteration and selection elements, but the resulting diagram would be more complex and difficult to interpret.

The example below shows how we can model a situation in which a bank account has been closed (but not deleted), and is then re-opened. The event Account Reopened (labelled Q1) causes a quit back to the node Account Life (labelled R1).

Two possible uses of quit and resume
Two possible uses of quit and resume

The quit and resume facility also allows us to quit from the main structure altogether, and resume at a point in a stand-alone structure. This can be used in situations where an event that can occur at any time will alter the normal sequence of the life history. Since it is impossible to predict exactly where such an event might occur within the entity's life history, an appropriate instruction should be added to the diagram indicating the circumstances under which the quit might occur, and from where. In the above example, the death of a customer may occur at any time after an account is opened, triggering an immediate quit, followed by a resume at R2 (Death Structure). In circumstances such as the death of a customer, the normal sequence will no longer apply.

Note that it is also possible to quit from a stand-alone structure back to the main structure in an entity life history. To avoid ambiguity, while there may be more than one quit point with the same identifier, there cannot be more than one resume point with that identifier.

This article was first published on the website in January 2009.

Logical Data Modelling

Systems Analysis

Chris Wells, September 27, 2022

In order to design an information system, it is necessary to know what information is to be held and how it should be organised. To this end, we can create a logical data structure (LDS) that allows us to see how related data items are grouped together, and how the various groups (or entities) relate to each other. Entities represent things about which we store information, such as customers, suppliers, orders, or employees. In our model, the entity "Customer" will be used to represent all customers about whom we hold information. An entity is shown on a logical data structure diagram as a rectangular box, with the name of the entity inside it.

The data items that make up an entity are referred to as attributes. An attribute is a discrete item of information about an entity, such as age, surname, or product code. Each attribute will have its own data type, depending on what kind of information it represents. Because we need to be able to uniquely identify every instance of an entity, one of the attributes of an entity (or possibly a combination of two or more attributes) must be unique for that entity within our system. For an occurrence of the entity "Customer", for example, we might have a unique alphanumeric code. An attribute (or combination of attributes) that is used in this way is known as a primary key.

Entities have relationships with other entities. In a hospital information system, for example, we might hold information about wards, and the patients in those wards. Both patients and wards can be represented in our logical data structure by entities, and there is clearly a relationship between the two. For any given patient, our information system should be able to tell us in which ward that patient is being accommodated. Conversely, for any given ward, the system should be able to provide us with a list of patients. The relationship between patients and wards is illustrated below.

A one-to-many relationship
A one-to-many relationship

The relationship shown in the above LDS is a one-to-many relationship in that one occurrence of ""Ward" may have many occurrences of "Patient" associated with it. The crow's foot is used to signify the 'many' end of the relationship. The entity at the 'one' end is usually referred to as the master entity, while the entity at the 'many' end is referred to as the detail entity. One-to-many relationships are by far the most common type of relationship to be found in a logical data structure. There are three possible degrees of relationship:

  • one to many
  • many to many
  • one to one

Most relational database management systems are not designed to handle many-to-many relationships. For that reason they are to be avoided when creating a logical data structure. Consider the logical data structure shown below, which represents the relationship between patients and drugs in a hospital.

A many-to-many relationship
A many-to-many relationship

A patient may be prescribed any number of different drugs, and a drug may be prescribed to any number of different patients, so we would seem to have a many-to-many relationship. If we consider the prescription itself to be an entity, however, we can create a somewhat different model. A prescription will have attributes such as the prescription date, the name of the drug prescribed, and the frequency and dosage to be taken. By definition, a prescription can only be for one patient, although a patient may have any number of prescriptions. We will also stipulate that the definition of a prescription means that it can only be for one drug, although obviously the same drug may appear on many prescriptions. We can now produce a revised LDS, as shown below.

Patient and Drug are linked via Prescription
Patient and Drug are linked via Prescription

The many-to-many relationship has been replaced by two separate one-to-many relationships. A patient may have many prescriptions, but a prescription can only be for one patient. Similarly, a drug can be prescribed by many prescriptions, but a prescription can only be for one drug. The entity "Prescription" is thus a detail entity for two relationships. We call an entity that links together other entities in this way a link entity. Wherever many-to-many relationships exist, we should be looking to create a link entity and two one-to-many relationships.

One-to-one relationships are also often undesirable in a logical data structure, although there are obviously examples of groups of information that are related to one another in this way. Take the example of "Employee" and "Curriculum Vitae".

A one-to-one relationship
A one-to-one relationship

Obviously, since an organisation should hold only one curriculum vitae for each employee, and a curriculum vitae will certainly only concern one employee, there is a one-to-one relationship between the two. In most cases, however, the two entities can be merged to create a single entity.

We should also consider the question of whether a relationship must always exist between two entities, or whether the existence of a relationship is either optional or dependent on other factors. We may, for example, have an entity called "Customer" to represent organisations or individuals to whom we are intending to sell something. We will have another entity called "Order" to represent the sales orders which we hope will be generated. There will obviously be a one-to-many relationship between "Customer" and "Order", since an order can only relate to one customer, but a customer may place any number of orders. At some point, however, we may have customers on our books that have not yet placed an order, perhaps because they are still evaluating a quotation. So, while an occurrence of "Order" cannot exist without a related occurrence of "Customer", the reverse is not true.

In any one-to-many relationship between a master entity (A) and a detail entity (B), there are four possible states in which the relationship between these two entities could exist:

  1. Mandatory at both ends - if an occurrence of A exists it must be associated with at least one occurrence of B. If an occurrence of B exists, it must be associated with one (and only one) occurrence of A.
  2. Optional at the master end and mandatory at the detail end - if an occurrence of A exists it may be associated with at least one occurrence of B. If an occurrence of B exists, it must be associated with one (and only one) occurrence of A.
  3. Mandatory at the master end and optional at the detail end - if an occurrence of A exists it must be associated with at least one occurrence of B. If an occurrence of B exists, it may be associated with one (and only one) occurrence of A.
  4. Optional at both ends - if an occurrence of A exists it may be associated with at least one occurrence of B. If an occurrence of B exists it may be associated with one (and only one) occurrence of A.

In practice, optional relationships are generally considered to be those in which it an occurrence of the detail entity can exist without its corresponding master. In a one-to-many relationship, the connecting line between the master and detail entities is usually shown as a solid line if the relationship is mandatory, and a broken line if the relationship is optional.

There are four possible states for one-to-many relationships
There are four possible states for one-to-many relationships

Sometimes, it is possible for an entity to have a relationship with one entity that precludes a relationship with another entity, or vice versa. Consider the LDS below. The detail entity B can be in a relationship with master entity A or with master entity C, but not with both. The arc on each of the connecting lines (identified in this case by the letter a) indicates that these relationships are mutually exclusive.

Mutually exclusive relationships
Mutually exclusive relationships

In some cases, multiple relationships can exist between the same pair of entities, where the only other way of avoiding a many-to-many relationship would be to create an artificial link entity. Think about trying to model the relationship between teams in a local football league and fixtures. Every team in the league will have many fixtures, and each fixture will involve two teams. We thus appear to have a many-to-many relationship between "Team" and "Fixture".

Rather than create an artificial link entity to eliminate the many-to-many relationship, we can use the fact that only two teams can be involved in any one fixture and create two one-to-many relationships as shown below. When two or more relationships exist between entities in this way, it is a good idea to label them to avoid confusion.

Entities can be linked by multiple relationships
Entities can be linked by multiple relationships

Logical data structure diagrams should be easy to understand. They should, as far as possible, be drawn so that the lines representing relationships do not cross. Entities that relate to the same area of business activity should be grouped together (this will often occur naturally, since they will tend to have relationships with one another). One of the conventions used when drawing the LDS is to try to place the crow's feet so that they appear on the top of the detail entity in a one-to-many relationship. This makes the diagram easier to follow, since master entities will always be above their detail entities.

The LDS should be validated against the data flow diagrams to ensure that all of the processing shown in the DFDs is fully supported. The data items required for the processing must be included in entities, and the relationships between entities must be in place to enable the all of the required data retrieval operations.

The documentation required to support the LDS will include an entity description for each entity in the LDS. The description should include the names of any master and detail entities with which the entity being described has a relationship, volumetric information about the entity (the average and maximum number of occurrences for the entity), a list of the entity's attributes, and the identity of the attribute (or attributes) used as the entity's primary key.

This article was originally published on the website in January 2009.

Relational Data Analysis

Systems Analysis

Chris Wells, September 27, 2022

Relational data analysis is a technique, based on relational database theory, that puts forward the idea that all data can be held in tables, otherwise known as relations. Relational analysis is about how data can best be organised into relations. For a given dataset, relational analysis will produce a set of tables (or relations) that between them represent all of the data. A relation is equivalent to an entity (see the article on logical data modelling). The illustration below shows two tables that hold information about hospital patients and wards.

The "Patient" and "Ward" relations represented as tables
The "Patient" and "Ward" relations represented as tables

A row in the "Patient" table represents a single patient record. Each row must be uniquely identifiable, and there must be no duplicate rows. A combination of all of the attributes in any one row might be required to uniquely identify the row, although in practice it is often only necessary to use one attribute, or at most a small subset of the attributes, for this purpose. The value (or combination of values) used to uniquely identify a row is known as the primary key. For any relation (or table), each primary key must be unique, and a primary key must exist for every row. No significance is attached to the order in which the rows appear.

Each column in the table represents one of the table's attributes, and is given an appropriate heading. For each row and column intersection, only one value can be entered, and the values in a particular column will all have the same data type. As with rows, no significance attaches to the order in which the columns appear. By convention, however, the primary key columns normally occupy the leftmost positions.

The term domain is used to describe the pool of values from which any one column may be drawn. The domain of Patient No. includes all possible patient numbers, not just the ones currently in the hospital. The domain of Sex has only two possible values, male (M) and female (F). The value of domains lies in being able to compare values from different tables that represent the same thing. If we want to find the name of the ward a particular patient is in, for example, we can compare the values of Ward No. in the "Patient" table with the value of Ward No. in the "Ward" table. This only really works if the values being compared come from the same pool of values.

Normalised relations

The object of relational data analysis is to organise all of the data items used by the system into a set of well normalised relations. This means that we eliminate certain undesirable properties, such as unnecessary duplication (redundancy) of data items in different relations, and the possibility of problems arising when we modify, insert or delete data (usually referred to as update anomalies). The term normalisation is sometimes used in place of relational data analysis. Although there have been intensive studies in the past that have uncovered up to fourteen possible levels of normalisation, we are usually only concerned with the first four levels, which are:

  • Un-normalised form (UNF)
  • First normal form (1NF)
  • Second normal form (2NF)
  • Third normal form (3NF)

The "normalisation" of data rarely proceeds beyond the third normal form.

In order to carry out the normalisation process, it is necessary to collect enough sample data to ensure that all of the data items involved in the system's processing are listed. To this end it is often a good idea to analyse the documentation used by the various system processes. For our hospital system, for example, we can glean a lot of useful data from the prescription record cards.

Samples of data are collected from prescription record cards
Samples of data are collected from prescription record cards

Un-normalised data

The first thing to do is to tabulate the data in un-normalised form, and choose a suitable key. Column headings are effectively attribute names, and as such should be meaningful but not too long-winded. The key attribute (or attributes) chosen must provide a unique key for the data source. If the key has to consist of more than one attribute (i.e. it is a compound key), use the smallest number of attributes possible. Also avoid using textual keys if practical to do so. For the un-normalised data below, we have chosen Patient Number as the primary key. The value (or values) chosen to be the primary key is underlined to indicate its status.

Un-normalised data
Pat NoSurnameForenameWard NoWard NamePresc DateMed CodeMed NameDosageLgth Treat
454SmithJohn6Bracken11.11.08CO5768Cortisone2 pills 3 x day after meals4 days
     18.11.08MO1234MorphineInjection 4 hourly4 days
     22.11.08MO1234MorphineInjection 8 hourly2 days
     23.11.08PE9876Penicillin1 pill 3 x day7 days
223JonesPeter8Meavy11.11.08AS2233Aspirin2 pills 3 x day after meals7 days
     15.11.08VA7867Valium2 per day5 days

First normal form (1NF)

To get the data into first normal form, it is necessary to remove any repeating groups of data to separate relations. You should then choose keys for each new relation identified. A repeating group is a data item or group of data items that occurs with multiple values for a single value of the primary key. In the table above, we can see several values for Presc Date, Med Code, Med Name, Dosage and Lgth Treat. These items are a repeating group and are removed to a separate relation. Note that Pat No is still required to ensure that each row is unique (two patients could conceivably be given the same prescription on the same day). Pat No, Presc Date and Med Code together form the compound primary key for the new relation.

First normal form [1]
Pat NoPresc DateMed CodeMed NameDosageLgth Treat
45411.11.08CO5768Cortisone2 pills 3 x day after meals4 days
45418.11.08MO1234MorphineInjection 4 hourly4 days
45422.11.08MO1234MorphineInjection 8 hourly2 days
45423.11.08PE9876Penicillin1 pill 3 x day7 days
22311.11.08AS2233Aspirin2 pills 3 x day after meals7 days
22315.11.08VA7867Valium2 per day5 days

The data is now represented by two tables (or relations) in first normal form. The data items that do not repeat for a single value of the primary key selected for the un-normalised data remain together as a relation, as shown below, and the primary key remains as Pat No.

First normal form [2]
Pat NoSurnameForenameWard NoWard Name

Second normal form (2NF)

To get the data into second normal form, we must remove any items that depend on only part of a key to a separate relation. This obviously only applies to relations that have compound keys. We must determine whether any data items in a relation with a compound key are only dependent upon part of the compound key, or upon all of it. In the first of the two relations shown above in first normal form, the combination of Pat No, Presc Date and Med Code together determine the data items Dosage and Lgth Treat, but only Med Code is required to determine Med Name. Thus, Med Name is removed from the relation, and Med Code and Med Name form a new relation, with Med Code as the key. The two second normal form relations thus obtained are shown below.

Second normal form [1]
Pat NoPresc DateMed CodeDosageLgth Treat
45411.11.08CO57682 pills 3 x day after meals4 days
45418.11.08MO1234Injection 4 hourly4 days
45422.11.08MO1234Injection 8 hourly2 days
45423.11.08PE98761 pill 3 x day7 days
22311.11.08AS22332 pills 3 x day after meals7 days
22315.11.08VA78672 per day5 days

Second normal form [2]
Med CodeMed Name

Third normal form (3NF)

To get the data into third normal form, we need to remove any data items not directly dependent on the key to separate relations. Here, we are looking for data items that might be dependent on data items other than the primary key. In most cases, these inter-data dependencies are relatively easy to find. Consider the relation below, which we derived when moving from our un-normalised data to the first normal form and which, due to an absence of any part-key dependencies, is already in second normal form.

Second normal form [3]
Pat NoSurnameForenameWard NoWard Name

We can see a possible a possible inter-data dependency between Ward name and Ward No. Furthermore, although Surname, Forename and Ward No are all dependent on Pat No in this relation, Ward Name does not appear to depend directly on Pat No. This means that Ward Name is determined by Ward No alone, and we can therefore create a new relation from Ward No and Ward Name, in which Ward No is the primary key. Ward No remains in the Patient relation, as its value is determined by Pat No. Here, however, it is acting as a foreign key. A foreign key is a data attribute that appears in a table in which it is not itself acting as the primary key (although it may form part of a compound primary key), whilst at the same time being the primary key in another table. Foreign keys are used to link tables together, as we will see. The complete set of third normal form relations is shown below.

Pat NoSurnameForenameWard No

Ward NoWard Name

Pat NoPresc DateMed CodeDosageLgth Treat
45411.11.08CO57682 pills 3 x day after meals4 days
45418.11.08MO1234Injection 4 hourly4 days
45422.11.08MO1234Injection 8 hourly2 days
45423.11.08PE98761 pill 3 x day7 days
22311.11.08AS22332 pills 3 x day after meals7 days
22315.11.08VA78672 per day5 days

Med CodeMed Name

The Prescription and Drug relations developed when we derived our second normal form relations were already in third normal form, so we now have a complete set of well-normalised relations. We can apply tests to each of our 3NF relations to make sure that they are indeed in third normal form. First, we ask whether, for a given primary key value, there is only one possible value for each of the other data items in the relation. Second, we ask whether each of those data items is directly and wholly dependent on the primary key for the relation. If the answer to these questions is yes, our data is in third normal form. We can now express the relations in SSADM notation, to make them a little easier to interpret:

Normalised relations in SSADM notation
Normalised relations in SSADM notation

Relational data analysis (or normalisation) is usually performed on a number of different data sources and will yield, from each source, a number of normalised relations. The process of combining those relations to form the full set of normalised relations for the information system under investigation is called optimisation. Relations with the same primary key are merged and given a name. The logical data structure that emerges from our analysis will be merged with the "Patient" and "Ward" relations which we examined at the start of this section. The resulting logical data structure is shown below.

The logical data structure for the hospital system
The logical data structure for the hospital system

This article was originally published on the website in January 2009.

Cost-Benefit Analysis

Systems Analysis

Chris Wells, September 27, 2022

Cost-benefit analysis is used to determine the economic feasibility of a project. The total expected costs are weighed against the total expected benefits. If the benefits outweigh the costs over a given period of time, the project may be considered to be financially viable. The costs involved with a software development project will consist of the initial development cost (the costs incurred up to the point where the new system becomes operational), and the operating costs of the system throughout its expected useful lifetime (usually a period of five years). The expectation is that at some point in the system's lifetime, the accumulated financial benefits of the system will exceed the cost of development and the ongoing operating costs. This point in time is usually referred to as the break-even point.

The benefits of the new system are usually considered to be the tangible financial benefits engendered by the system. These could be manifested as reduced operating costs, increased revenue, or a combination of the two. In some cases there may be one or more less tangible benefits (i.e. benefits that cannot be measured in financial terms), but such benefits are difficult to assess. Indeed the accuracy of a cost benefit analysis is dependent on the accuracy with which the development costs, operational costs and future benefits of the system can be estimated, and its outcome should always be treated with caution.

Because money will devalue over time, it is misleading to simply directly compare future operating costs and tangible benefits with the initial cost of developing the system. A discount rate is therefore selected so that future costs and benefits can be represented in terms of their present-day value. The discount rate used is often the current interest rate used by financial markets. The future value (FV) of a sum of money invested today (the present value, or PV) at a fixed interest rate (i) for a known number of time periods (n) can be calculated as follows:

FV = PV × (1 + i)n

Conversely, we can re-arrange this equation to get the present value of a future sum of money (e.g. the estimated future costs or benefits of the proposed system) as follows:

PV = FV / (1 + i)n

Note that we have assumed here that n represents the time period in years. If the value of i represents an annual discount rate, and the time period n is measured in months, the value of i must be divided by twelve.

System development costs (C) are assumed to be incurred when the system is commissioned. Taking an expected system life of five years, the yearly benefits (B1, B2, B3, B4 and B5) are assumed to occur at the end of year one, year two, year three, year four, and year five respectively. To compute the net present value (NPV), those benefits are discounted back to their present values and added to the cost of development, C (a negative value), as follows:

NPV = C + (B1/(1+i)1) + (B1/(1+i)2) + (B1/(1+i)3) + (B1/(1+i)4) + (B1/(1+i)5)

When the net present value is positive, the system has passed the break even point (i.e. it has paid for itself and justified the cost of the project). Many organisations use the internal rate of return (IRR) to gauge the economic viability of a project. This figure is calculated by dividing the net present value (net present value benefits + net present value costs) by the present value of the total cost of the system, including the development costs and all operating costs. The following simple example demonstrates some of these principles:

Cost-benefit analysis example

A new automated customer invoicing system has been recommended for our organisation by a firm of consultants. The requirements are already known, and we now want to carry out a cost benefit analysis. The system will cost £50,000 to develop, and will have a projected useful life of five years from the time it is installed, one year from now. After five years, the system database can be transferred to the replacement system, a saving of approximately £10,000.

The current system has operating costs estimated at £100,000 per annum, whereas the annual operating costs for the new system are estimated at only £75,000. In addition, the new system has intangible benefits estimated to be worth £10,000 per annum. It is assumed that all estimates for costs and benefits will increase at a rate of 10% annually for both the current system and the new system. The organisation has set a present value discount rate of 15% per annum. The cost-benefit analysis calculations are carried out using a spreadsheet, and are shown below:

A spreadsheet can be used to perform a cost-benefit analysis
A spreadsheet can be used to perform a cost-benefit analysis

The system starts to show a positive return during its second year of operation. The internal rate of return is calculated as follows:

Cumulative PV Benefits + Costs / Cumulative PV Costs = 133,686 / 348,938 = 0.383

This represents an IRR of 38.3% over the expected lifetime of the system, or 7.66% per year.

This article was originally published on the website in January 2009.

Data Flow Diagrams

Systems Analysis

Chris Wells, September 27, 2022

Data flow diagrams (DFDs) are a way of representing a system's business processes, the flow of data into and out of those processes, and the flow of data between the system and the external agencies with which it interacts. The resulting graphical views of the system, its processes, and its data flows can be used as a basis for discussion between system developers and users of the system. A hierarchy of DFDs is produced, starting with an overview that provides a very abstract view of the system and ending with a number of diagrams representing the lowest-level sub-processes. The highest level DFD is the context diagram, which simply shows the system of interest, the external entities with which it interacts, and the data flows between the system and the external entities. A typical context diagram is shown below.

A context diagram for an order management system
A context diagram for an order management system

You will notice several things about this diagram. An external entity (in this case "Customer") is shown as an ellipse, appropriately labelled. An external entity is either a source of data entering the system, or a destination for data leaving the system, or (more often than not) both. The system itself is simply shown as a rectangular box with a suitable name. A data flow is represented by a line, suitably labelled, with an arrow at one end (or in some cases both ends) showing in which direction the data is flowing. The Customer external entity is duplicated on the diagram for the sake of clarity, to avoid too many data flows close together. The diagonal line in the top left corner of the Customer external entity symbol is to indicate that more than one instance of this entity appears on the diagram. The last thing of note is the representation of a resource flow ("Goods"), which is shown as a line with an outline arrow-head (or sometimes a double-headed arrow) as opposed to a solid arrowhead. The context diagram serves to define the system boundary. Any entity with which the system interacts, but which is not a part of the system itself, is an external entity.

Before doing too much work on a set of data flow diagrams, it is worth drawing up a list of the external entities providing inputs to or receiving outputs from the system, and identifying those inputs and outputs. In addition, it would be useful to identify all of the high-level business activities included within the system boundary, and relate these activities to specific inputs and outputs. The sort of questions to be asked include questions such as "Who does what, when, where and how?" and "What data do each of these people need to carry out their tasks?".

Physical versus Logical DFDs

Existing and proposed systems can be modelled using physical and logical DFDs. A physical DFD shows how the system is (or will be) constructed, whereas a logical DFD is not concerned with the physical aspects of the system.

Physical DFDs clarify which processes are manual and which are automated, and describe processes in more detail than logical DFDs. They also show the sequence in which processes must be carried out, identify temporary data stores, specify the actual names of files and printouts, and define any controls used to ensure that processes are carried out correctly.

Logical DFDs concentrate on the logical flow of data between business processes rather than the physical implementation of the system, and allow analysts to understand the business more clearly. They attempt to rationalise the lowest-level processes and group them together to form the Level 1 DFD. They also attempt to rationalise the data stores in the system, to relate each data store to one or more entity in the Logical Data Structure (LDS), and ensure that each entity is found in only one data store. The logical DFD provides a solid basis on which to carry out a discussion of the system with users, and results in more stable systems. It also facilitates the elimination of redundancy, and makes it easier to create the final physical model.

Data flow diagrams can be used to represent the system, not only at different levels of detail, but from different perspectives. The four main types of data flow diagram are described below:

  • Current logical DFD - describes what the system does, but not necessarily how it does it. This is useful for discussing the functionality of the system without getting bogged down in too much detail.
  • Current physical DFD - describes what the system does and how the functionality is currently implemented. This type of diagram is useful for highlighting redundant processes and data stores, and for giving the analyst an insight into how the system operates in its present form.
  • Required logical DFD - describes what the new system must be able to do, but not necessarily how it should do it. This is useful for achieving consensus between developers and users on a requirements specification.
  • Required physical DFD - describes what the new system will do and how the functionality will be implemented. This type of diagram is produced during the design stage, and is useful for conveying to users how the system will be implemented.

The required physical and logical DFDs should reflect any changes that are to be made to the existing system, together with any additional features to be incorporated into the new system. These changes and additions should be recorded in a document called the requirements catalogue. The additional features proposed for the new system can be further categorised into those that are mandatory ("must have" features), and those that are seen as desirable but not essential ("would like to have" features).

The DFD Hierarchy

DFDs at different levels are part of a structured hierarchy, with the lowest tier showing the greatest level of detail. Functional decomposition is carried out until an appropriate level of detail has been attained, and a definitive Elementary Process Description (EPD) is defined for each lowest level process. DFDs are used to analyse the system to ensure that the final design is complete, and to provide important system documentation. As already mentioned, the highest level of DFD is the context diagram, which shows the system's relationship to the external entities with which it interacts.

A context diagram for a simple library system
A context diagram for a simple library system

We can represent the high-level processes within the system of interest by creating a level 1 data flow diagram. An example of a level 1 DFD for our order management system is shown below.

A level 1 data flow diagram for an order management system
A level 1 data flow diagram for an order management system

We now have three processes, "Manage enquiry", "Manage order", and "Manage sales ledger". Each process is represented by a rectangle, subdivided into three smaller rectangles. Each has a descriptive name that provides a clue as to the type of activity taking place within it, and an ID number in the top left corner. Note that the ordering of these ID numbers is purely arbitrary, and there is no priority implied by it. The space to the right of the ID number can be used, if required, to identify the person or department responsible for the process, or the location at which it occurs. A process is some activity that receives data, transforms it in some way, and (usually) outputs it again in a modified format.

We also have three data stores, "Sales orders", "Quotations" and "Invoices". The data store symbol is also a rectangle, subdivided into two smaller rectangles and open at one end. The boxed in area at the left side of the data store symbol contains an ID number, prefixed with an upper case "D" (for data, presumably!). Again, no priority is implied by the numbering of data stores. To the right of the ID number is the name of the data store, which usually gives a clue as to the kind of information held. The data store is a generic representation of some physical or electronic data storage medium, such as index cards or a database file. Like external entities, data stores can be duplicated on the same diagram for the sake of clarity.

Let's see another example. A level 1 DFD for our simple library system might look like this:

A Level 1 DFD for a simple library system
A Level 1 DFD for a simple library system

Further analysis

The context diagram and level 1 DFD may well be re-drawn a number of times before a consensus is reached between developers and users that the diagrams accurately represent all of the high level processes, data stores, and data flows. Looking at the above DFD, for example, it will become apparent that we do not currently have a data store for customer information. Once these diagrams are considered to be substantially correct, however, each of the high-level processes included in the level 1 DFD will probably require further analysis to break them down into their constituent sub-processes, resulting in a level 2 DFD being produced for each process shown on the level 1 DFD. A level 2 DFD for the "Manage enquiry" process is shown below.

A level 2 data flow diagram for the "Manage enquiry" process
A level 2 data flow diagram for the "Manage enquiry" process

Each (parent) process in the level 1 DFD will be decomposed into lower level (child) sub-processes. The lower level processes may be further decomposed if necessary, although it is unusual to have to do this beyond level 3, and often level 2 is sufficient. Note that the data flows in and out of a parent DFD must be evident in the child DFD. Note also that the ID number allocated to the parent process is carried down to each child process. In the example above, the "Manage enquiry" process has an ID number of 2, and the three sub-processes have the ID numbers 2.1, 2.2 and 2.3. Once a low level process is considered to be a discrete task that is sufficiently atomic in nature, no further decomposition is necessary and an elementary process description (EPD) can be produced for each low-level process (see example below).

Elementary process descriptions for "Manage enquiry"
Elementary process descriptions for "Manage enquiry"

DFD Guidelines

When drawing data flow diagrams, you should attempt to comply with the following requirements:

  • All processes must have at least one data flow in, and one data flow out
  • Each process should represent only one activity at a particular level
  • Each data store must have both inputs and outputs, and relate to at least one data flow
  • Each external entity must relate to at least one data flow
  • Each data flow must be attached to at least one process
  • A data flow from an external entity must flow into a process
  • A data flow to an external entity must flow from a process
  • A data flow to a data store can only come from a process
  • A data flow from a data store can only go to a process

Common Errors in DFDs include the omission of data flows, showing data flowing in the wrong direction, connecting data stores and external entities directly to each other, and incorrectly labeling processes or data flows. As a general rule, it is recommended not to include more than twelve processes on a DFD. A lower-level (child) DFD should have the same data flows in and out as its parent process.

This article is adapted from material first published on the website in January 2009.

Requirements Analysis

Systems Analysis

Chris Wells, September 27, 2022

Requirements analysis is carried out in two stages. First of all, an investigation of the current system is carried out. This enables the scope of the project to be determined, and highlights any problems with the system. The kind of problems identified could include redundant processing or processes that create bottlenecks, superfluous procedures, or excessive data redundancy. The initial investigation should identify users (and potential users) of the system, define the nature, volume and frequency of business transactions handled, and catalogue any existing hardware or software used.

A detailed system investigation is usually preceded by a feasibility study which, as its name suggests, will determine whether the project's stated objectives can be achieved within an acceptable time frame and at an affordable cost. Depending on the results of the feasibility study, some of the project parameters may need to be adjusted.

The second stage is to investigate a number of possible business options, including the identification of any additional features or services that the new system may be required to provide. The existing and proposed systems can be modelled using physical and logical Data Flow Diagrams (DFDs). A physical DFD shows how the system is (or will be) constructed, whereas a logical DFD is not concerned with the physical aspects of the system.

The data used in the current system is also examined and defined in a process that attempts to model the structure of the data by grouping data items into logical data entities. The process ignores the method of data storage used, or the physical location of the various data stores, and concentrates instead on the logical structure of the system's data. The relationships between the various data entities are defined, together with the key data values that are used to link them together. Data modelling can also be used to identify any shortcomings or omissions in the current system's dataset that should be addressed by the new system, and any new requirements identified are added to the requirements catalogue. Data modelling techniques will be looked at in more detail elsewhere.

The outcome of the requirements analysis stage is a detailed requirements specification that contains sufficient information to allow the new information system to be designed. A clear boundary will be defined for the proposed system that specifies what processes are included within it, and the external entities with which it interacts. The required system inputs and outputs are specified, together with the functionality that the system must provide. Performance requirements will be outlined in terms of user response times, processing speed, the volume of data storage to be provided, and the number and frequency of transactions to be handled. Special requirements should also be identified, such as the level of security to be implemented and the facilities that should be provided to enable routine maintenance and the backup of system data to be carried out efficiently. The specification may also make specific recommendations in terms of the application software that must be enabled to run on the system, and the minimum specification for system hardware such as file servers, workstations, communications devices, and network cabling.

The Feasibility Study

A feasibility study is, in many ways, a highly compressed version of the analysis phase of the system development life cycle. Its purpose is to determine, in a timely fashion and at reasonable cost, whether the problem can be solved and if so, whether this can be done cost effectively. In many cases, the feasibility study is also used to obtain a clear and in-depth definition of the problem, and to determine the scope of the proposed system. Although the cost of the feasibility study itself is not insignificant, its main function is essentially to determine whether far greater financial resources should be committed to the project under consideration. The outcome of the study will be a formal report to management which will provide the basis on which the decision as to whether or not to proceed with the project will be made. The report may outline a number of alternative solutions, and will provide estimates of the time, resources and financial outlay required for each option.

In the short term, the main questions that must be answered are whether the project is technically feasible (i.e. can the objectives identified be achieved with the technological resources available), whether the project can be completed in the time available, and whether the funds required to finance the project will be available. Perhaps a more important question is whether the long-term benefits to the business of the new system will outweigh the cost of implementing and maintaining it. This question will be answered by undertaking a cost/benefit analysis. If a decision is made to go ahead with the project, the information gathered can be used to prepare project schedules, allocate resources, and provide a cost estimate for the project. Bear in mind, however, that the feasibility study is optional. For small projects or for projects that, for one reason or another, must be undertaken (for example to satisfy the requirements of new legislation), the feasibility study may be omitted.

The System Investigation

When a new or upgraded system is introduced into an organisation, it is usually intended to support the work already carried out by the organisation. Although the new system may differ substantially from the existing system, the information being handled, and the main functions of the system, will remain relatively unchanged. An analysis of the existing system, therefore, provides a firm basis for the design of the new system.

Other reasons for performing a system investigation include:

  • Determining the scope of the project, by evaluating the complexity of the problem and the effort required to complete it. This information can assist with planning the project and allocating the necessary resources to it.
  • Increasing user confidence, by reassuring users that the analyst fully understands the nature of the problem, and the business operations that the system must carry out.

Some of the aspects that will be investigated include:

  • Operations and data - an understanding of the current operations and data will make it easier to understand the requirements of the new system
  • Existing problems - determining what the problems are with the existing system ensures that they will not be replicated in the new system
  • System boundaries - determine which business areas are within the scope of the project, to ensure that effort is not wasted on areas that lie outside the scope of the project, and to ensure that all relevant areas are included. The system boundaries should be explicitly stated and agreed with all those concerned.

Note that an investigation of the existing system does not constrain the project team to simply re-hash the features of the current system. A fresh approach to meeting the system objectives is taken, and may lead to a complete restructuring of both the business processes and data, in a new system that is nothing like the old one.

Some investigation techniques:

  • Interviews
  • Studying the current system documentation
  • Questionnaires
  • Observation of the system in operation
  • Reviewing previous studies
  • Surveys

Some or all of these techniques may be used in the investigation of the existing system, and will result in a set of data flow diagrams representing the current system, a logical data structure of the current system's data, and the initial problems/requirements list for the project.

This article is an adaptation of material first published on the website in January 2009.

Development Methodologies

Systems Analysis

Chris Wells, September 27, 2022

The term software development methodology is used to describe a framework for the development of information systems. A particular methodology is usually associated with a specific set of tools, models and methods that are used for the analysis, design and implementation of information systems, and each tends to favour a particular lifecycle model. Often, a methodology has its own philosophy of system development that practitioners are encouraged to adopt, as well as its own system of recording and documenting the development process. Many methodologies have emerged in the past few decades in response to the perceived need to manage different types of project using different tools and methods. Each methodology has its own strengths and weaknesses, and the choice of which approach to use for a given project will depend on the scale of the project, the nature of the business environment, and the type of system being developed. The following sections describe a small number of software development approaches that have evolved over the years.

Structured Systems Analysis Methodology (SSADM)

Structured Systems Analysis Methodology (SSADM) is a highly structured and rigorous approach to the analysis and design of information systems, one of a number of such methodologies that arose as a response to the large number of information system projects that either failed completely or did not adequately fulfil customer expectations.

Early large scale information systems were often developed using the Cobol programming language together with indexed sequential files to build systems that automated processes such as customer billing and payroll operations. System development at this time was almost a black art, characterised by minimal user involvement. As a consequence, users had little sense of ownership of, or commitment to, the new system that emerged from the process. A further consequence of this lack of user involvement was that system requirements were often poorly understood by developers, and many important requirements did not emerge until late in the development process, leading to costly re-design work having to be undertaken. The situation was not improved by the somewhat arbitrary selection of analysis and design tools, and the absence of effective computer aided software engineering (CASE) tools.

Structured methodologies use a formal process of eliciting system requirements, both to reduce the possibility of the requirements being misunderstood and to ensure that all of the requirements are known before the system is developed. They also introduce rigorous techniques to the analysis and design process. SSADM is perhaps the most widely used of these methodologies, and is used in the analysis and design stages of system development. It does not deal with the implementation or testing stages.

SSADM is an open standard, and as such is freely available for use by companies or individuals. It has been used for all government information systems development since 1981, when it was first released, and has also been used by many companies in the expectation that its use will result in robust, high-quality information systems. SSADM is still widely used for large scale information systems projects, and many proprietary CASE tools are available that support SSADM techniques.

The SSADM standard specifies a number of modules and stages that should be undertaken sequentially. It also specifies the deliverables to be produced by each stage, and the techniques to be used to produce those deliverables. The system development life cycle model adopted by SSADM is essentially the waterfall model, in which each stage must be completed and signed off before the next stage can begin.

SSADM techniques

SSADM revolves around the use of three key techniques that derive three different but complementary views of the system being investigated. The three different views of the system are cross referenced and checked against each other to ensure that an accurate and complete overview of the system is obtained. The three techniques used are:

  • Logical Data Modelling (LDM) - this technique is used to identify, model and document the data requirements of the system. The data held by an organisation is concerned with entities (things about which information is held, such as customer orders or product details) and the relationships (or associations) between those entities. A logical data model consists of a Logical Data Structure (LDS) and its associated documentation. The LDS is sometimes referred to as an Entity Relationship Model (ERM). Relational data analysis (or normalisation) is one of the primary techniques used to derive the system's data entities, their attributes (or properties), and the relationships between them.
  • Data Flow Modelling - this technique is used to identify, model and document the way in which data flows into, out of, and around an information system. It models processes (activities that act on the data in some way), data stores (the storage areas where data is held), external entities (an external entity is either a source of data flowing into the system, or a destination for data flowing out of the system), and data flows (the paths taken by the data as it moves between processes and data stores, or between the system and its external entities). A data flow model consists of a set of integrated Data Flow Diagrams (DFDs), together with appropriate supporting documentation.
  • Entity Behaviour Modelling - this technique is used to identify, model and document the events that affect each entity, and the sequence in which these events may occur. An entity behaviour model consists of a set of Entity Life History (ELH) diagrams (one for each entity), together with appropriate supporting documentation.

SSADM's structured approach

Activities within the SSADM framework are grouped into five main modules. Each module is sub-divided into one or more stages, each of which contains a set of rigorously defined tasks. SSADM's modules and stages are brieffly described in the table below.

The SSADM framework
Feasibility Study
(module 1)
(Stage 0)
The high level analysis of a business area to determine whether a proposed system can cost effectively support the business requirements identified. A Business Activity Model (BAM) is produced that describes the business activities and events, and the business rules in operation. Problems associated with the current system, and the additional services required, are identified. A high level data flow diagram is produced that describes the current system in terms of its existing processes, data stores and data flows. The structure of the system data is also investigated, and an initial LDM is created.
Requirements Analysis
(module 2)
Investigation of Current Environment
(stage 1)
The systems requirements are identified and the current business environment is modelled using data flow diagrams and logical data modelling.
Business System Options
(stage 2)
Up to six business system options are presented, of which one will be adopted. Data flow diagrams and logical data models are produced to support each option. The option selected defines the boundary of the system to be developed.
Requirements Specification
(module 3)
Definition of Requirements
(stage 3)
Detailed functional and non-functional requirements (for example, the levels of service required) are identified and the required processing and system data structures are defined. The data flow diagrams and logical data model are refined, and validated against the chosen business system option. The data flow diagrams and logical data model are then validated against the entity life histories, which are also produced during this stage. Parts of the system may be produced as prototypes and demonstrated to the customer to confirm correct interpretation of requirements and obtain agreement on aspects of the user interface.
Logical System Specification
(module 4)
Technical System Options
(stage 4)
Up to six technical options for the development and implementation of the system are proposed, and one is selected.
Logical Design
(stage 5)
In this stage the logical design of the system, including user dialogues and database enquiry and update processing, is undertaken.
Physical Design
(module 5)
Physical Design
(stage 6)
The logical design and the selected technical system option provide the basis for the physical database design and a set of program specifications.

SSADM is well-suited to large and complex projects where the requirements are unlikely to change significantly during the project's life cycle. Its documentation-oriented approach and relatively rigid structure makes it inappropriate for smaller projects, or those for which the requirements are uncertain, or are likely to change because of a volatile business environment.

Rapid Application Development

Rapid application development (RAD) is an iterative and incremental software development process that is designed to produce software as quickly as possible. The term tends to refer to a range of techniques geared to the rapid development of applications, such as the use of various application development frameworks. RAD was an early response to more structured and formal approaches like SSADM which were not felt to be appropriate for projects undertaken within a highly volatile and evolving business environment.

The philosophy behind RAD is that there is an acceptable trade-off between the speed of development and the overall functionality or performance of the software delivered. Put another way, RAD can deliver a working solution that provides 80% of the required functionality in 20% of the time required by more traditional approaches. Two major benefits of this approach are that the customer gets to see results very quickly, and that a production version of the software will be available within a relatively short time frame, greatly reducing the likelihood that the customer's business environment will have undergone significant change by the time the new system is delivered. The down side is that some of the desirable (but non-essential) features of the software may be sacrificed in order to speed development, and the performance of the resulting system, while acceptable, may not be optimal. System acceptance is based upon the system achieving the agreed minimum functionality and usability.

A RAD team is usually small (maybe six or so people including developers and users), and developers are usually required both experienced and multi-skilled, since they will be combining the role of analyst, designer and programmer. The project begins with an initial Joint Application Development (JAD) meeting during which developers and customer representatives determine the initial requirements of the system and agree a time frame in which a prototype system will be ready. The developers design, build and test a prototype system that reflects these initial requirements. The customer then evaluates the prototype system to determine how far it meets their requirements, and what functionality or features need to be improved or added.

A focus group meeting then takes place, during which the customer reports back to the development team. The requirements specification is revised to incorporate new features and improvements, and the time frame for the next iteration is agreed. Features that are deemed to be of secondary importance may, by negotiation, be dropped from the new requirements specification if they will negatively impact on the time frame for the new prototype. The cycle of iterations and focus group meetings continues until a final prototype is accepted by the customer as a production version of the new system.

The Rapid Application Development life cycle
The Rapid Application Development life cycle

The "time box" in which an iteration occurs is short (usually from a few days up to about three weeks). Documentation of requirements and design documentation is usually restricted to notes taken from meetings, rather than the formal documentation associated with more structured methodologies, and will consist of the minimum documentation required to facilitate the development and maintenance of the system. The entire life cycle is relatively short (usually a few months), and should result in a steady convergence between the customer's concept of the new system and that of the development team, resulting in a workable business solution that is fit for its intended purpose.

One of the benefits claimed for RAD was that, because customers often had only a vague idea of what they wanted, the availability of a working prototype would help to crystalise their thoughts in this respect and enable them to evolve a more definitive set of requirements. Whereas some system development methodologies attempted to determine the complete set of requirements in advance in an attempt to eliminate future changes to the scope of the project, RAD was able to incorporate change as part of an evolutionary development process.

RAD leveraged the benefits of a number of software development tools in order to speed up the development process, including a range of computer aided software engineering (CASE) tools. Code re-use, the use of object oriented programming languages, and the utilisation of third-party software components were all embraced by RAD developers. Fourth generation visual programming languages, the forerunners of today's integrated development environments (IDEs) were used to create the graphical user interface (GUI), while code production was further speeded through the use of an appropriate application programming interface (API) that provided much of the base code for the application.

RAD tended to be used successfully for projects that did not have a high degree of criticality, and where the trade off between a short time frame for development on the one hand, and quality and performance on the other, was acceptable. It was not suitable for systems where optimal performance was required, or that must interoperate with existing systems. The flexibility of RAD lay in the ability to produce results quickly and adapt specifications to meet an evolving set of customer requirements. From the customer's point of view, seeing a working prototype early on in the proceedings helped them to focus on what they did or didn't want from the system, and the continuing dialogue with the development team meant that developers had a good understanding of the customer's requirements.

The speed of development and the relatively small size of development teams tended to result in reduced development costs, although the absence of classic project milestones sometimes made it difficult to accurately measure progress. Today, some of the principles of RAD have been adopted by practitioners of agile development methods, themselves a response to an increasingly volatile business environment.

Agile Software Development

Agile software development refers to a group of loosely related software development methodologies that are based on similar principles. Notable examples include the Unified Software Development Process (USDP) and Extreme Programming (XP). Agile methodologies are characterised by short life-cycle iterations (typically measured in weeks), with minimal planning and documentation. The goal is to deliver working software to the customer at the end of each cycle. Each iteration involves a number of phases including planning, requirements analysis, implementation, and testing. This incremental, iterative approach helps to reduce overall risk, while enabling the output of the project to be adapted to meet changing requirements or circumstances. Documentation is generally limited to what the customer requires. Agile methodologies have evolved as an alternative to more traditional, process-driven methodologies.

The emphasis is on frequent (usually daily) face-to-face communication within the project team, and between the project team and the customer. Written documentation is of secondary importance, and meetings are usually formal but brief. Project teams are typically small (usually less than a dozen people) to facilitate communication and collaboration. Where agile methods are applied to larger projects, the different parts of a project may be allocated to several small teams of developers.

Each iteration of the project life cycle results in the production of working software, which is then evaluated by the customer before the next iteration begins. The production of working software, rather than the completion of extensive project documentation, is seen as the primary measure of progress. The software produced at the end of an iteration has been fully developed and tested, but embodies only a subset of the functionality planned for the project as a whole. The aim is to deliver functionality incrementally as the project progresses. Further functionality will be added, and existing features will be refined, throughout the life of the project.

Agile methods are favoured over more structured methodologies for projects where requirements are not initially well defined, or are likely to change over the lifetime of the project. They work well where the project team is small, and comprised of experienced developers. Both the project management techniques and the development tools used are selected on a project-by-project basis, allowing the overall process to be tailored to the needs of a particular project. The short duration of iterations and the absence of a rigid set of requirements or design documentation allow developers to respond quickly to changing requirements and circumstances. The emphasis on constant interaction between customers and developers provides continual feedback that helps to keep the project on track.

Agile methods are not so well suited to large-scale projects where the requirements are well defined, where the business environment is relatively non-volatile, or where the predominant organisational culture is intolerant of a lack of structure or documentation. The emphasis on frequent face-to-face communication as an essential element of the development process means that agile methods do not lend themselves easily to projects that are distributed over a wide geographical area, or that require large teams of developers. Critics of agile methods have also pointed out the difficulties that may arise in terms of negotiating a contract or determining the cost of a project where the scope of the project is not initially well-defined, and requirements are unclear.

Unified Software Development Process (USDP)

The Unified Software Development Process (USDP) is an iterative and incremental software development process framework which, it is claimed, can be adopted for the development of both small and large scale information systems. The development cycle is divided into four main phases:

  • Inception - this is usually a fairly short phase that is primarily used to establish the scope and objectives of the project. It lays down both the overall aims and the specific functional objectives, such as being able to log into and out of the system. These specific functional objectives are referred to as use cases. The phase will also identify one or more candidate architectures for the system, identify risks, and determine a preliminary project schedule and cost estimate. The end of the inception phase is marked by the Objective milestone.
  • Elaboration - in this phase, most of the system requirements, the known risks, and the system architecture are established. Use case diagrams, conceptual diagrams and package diagrams are created. A partial implementation of the system is produced in a series of short, time-boxed iterations that includes the core architectural components, and establishes an executable architecture baseline. The other deliverable from this phase is a blueprint for the next phase (the construction phase) that includes estimates of the cost and the time required for completion. The end of the elaboration phase is marked by the Architecture milestone.
  • Construction - the remaining parts of the system are built on the foundations laid down in the previous phase in a series of short, time-boxed iterations, each resulting in a software release. A number of common Unified Modelling Language (UML) diagrams are used during this phase for the purpose of specifying visualising, constructing and documenting the system. The end of the construction phase is marked by the Initial Operational Capability milestone.
  • Transition - the system is deployed to its operational environment and user community. Feedback from an initial release may lead to further iterations within the transition phase that incorporate further refinements to the system. This phase may also include activities such as data conversion and user training. The end of the transition phase is marked by the Product Release milestone.

The system architecture describes the various functional subsystems that make up the system, such as those responsible for handling input and output, data communications, and information reporting, and the interactions between them and the rest of the system. A risk is any obstacle to success (e.g. insufficient or inexperienced personnel, lack of funding, or severe time restrictions. Each iteration results in a single release of the system, although there can be one or more intermediate builds within a single iteration. The feedback from each release is used to shape future iterations.

The unified process defines six core process disciplines:

  • Business modelling
  • Requirements
  • Analysis & Design
  • Implementation
  • Testing
  • Deployment

Most iterations will include some work in most of the process disciplines. The relative emphasis placed on each activity, and the effort it requires, will change over the course of the project. This is illustrated by the following diagram:

The Unified System Development Process lifecycle
The Unified System Development Process lifecycle

Extreme Programming

Extreme Programming (or XP) is an agile software development methodology that takes traditional software engineering practices to "extreme" levels to achieve a development process that is more responsive to customer needs than traditional methods, while creating good quality software. Changing requirements are seen as an inescapable feature of software development projects in an increasingly unpredictable business environment. XP Practitioners believe that a software development methodology that embodies the capacity to adapt to changing requirements is a more realistic approach than trying to define all of the requirements at the start of a project. Rapidly-changing requirements demand shorter development life-cycles, and are incompatible with traditional methods of software development.

Individual developers are assigned specific tasks, and are responsible for their completion. No code is written until unit tests have been designed for individual code components and subsystems. The customer is responsible for defining appropriate acceptance tests that are subsequently used to validate the software produced during an iteration. At the end of an iteration, the development team delivers a working system to the customer. The system may not be complete, but all functionality implemented works. A further meeting is scheduled to plan the next iteration, and the cycle begins again.

The Extreme Programming methodology encompasses a set of values, principles and practices designed to facilitate the rapid development of high-quality software that satisfies customer requirements.

The twelve core practices of XP are described below.

  • The planning game - the development team collaborates with the customer to produce working software as quickly as possible. The customer produces a list of the required system features, each described in a user story, which gives the feature a name and outlines its functionality. User stories are typically written on index cards. The development team estimates the effort required to code each story, and how much effort the team can produce in a single iteration. The customer decides which user stories to implement, and in what order, as well as how often to produce a production releases of the system.
  • Small releases - the first iteration produces a working software release that embodies the functionality identified by the customer as being the most essential. Subsequent iterations add additional features as requested by the customer. Iterations are of fixed length (typically from two to three weeks).
  • System metaphor - each project has an organising metaphor, which provides an easy to remember naming convention.
  • Simple design - the simplest possible design is used that will satisfy customer requirements. Because of the high probability of changes to requirements, only currently known requirements will be considered.
  • Test driven development - unit tests are written by developers to test functionality as they write code. Acceptance tests are specified by the customer to test that the overall system is functioning as expected. All tests must be successfully completed before software is released.
  • Refactoring - any duplicate or unnecessary code generated in a coding session is eliminated, fostering the utilisation of re-usable code.
  • Pair programming - all code is written by two programmers working together on one computer, with the aim of producing high quality code. One person will focus on coding while the other will focus on strategic issues.
  • Collective code ownership - no one programmer "owns" a code module. Any developer can be required to work on any part of the code at any time.
  • Continuous integration - all changes are integrated into the system daily. Integration testing must be successful carried out before further integration occurs.
  • Sustainable pace - developers are expected to be able to go home on time. Excessive overtime is taken as a sign that something is wrong with the development process.
  • Whole team - the development team has continuous access to a customer representative.
  • Coding standards - all programmers are expected to write code to the same standards. Ideally, it should not be possible to tell which member of the development team has written a code module simply by examining the code.

Extreme Programming may be appropriate for relatively small-scale projects where the requirements change rapidly, or where some initial development is needed before previously unforeseen implementation problems can be determined. It may not work so well for larger projects, or projects where the requirements are unlikely to change.

CASE Tools

A software development project is a time-consuming and complex activity that requires the resources of a number of people and a significant financial outlay. A great deal of effort is required to establish the objectives of the project, which are then formalised in a requirements specification. Further effort must be expended in designing a solution that meets all of the requirements within the specified schedule, and within budgetary constraints. Once the design has been finalised, perhaps the most work-intensive part of the project is the implementation phase, during which all of the program code is written and tested. All of these activities must be documented, and the documentation maintained in an up-to-date state, to facilitate future system maintenance or enhancement. Any means of automating these activities, in whole or in part, can help to minimise the time required to complete the project and thus significantly reduce overall project costs.

Computer-Aided Software Engineering (CASE) is the application of a range of software tools to the development of information systems. Indeed, software tools can be applied to the entire range of activities involved in the systems development lifecycle, including the analysis, design, implementation, testing, documentation, and maintenance of information systems. Since all stages of the software development life-cycle can be directly supported by software of one kind or another, a broad definition of the term "computer-aided software engineering" might indicate the inclusion of project management software, compilers, assemblers, and linkers in the list of CASE tools. Usually, however, only those tools that are directly involved in the analysis, design and coding of information systems are considered to be included.

The first CASE tools were often developed to help software developers carry out a specific task, such as the production of flowcharts or the automated generation or re-factoring of program code. Later, integrated families of CASE tools were developed that offered support for a range of activities related to the development of information systems. Some of these integrated CASE tool suites were designed to support a particular development methodology. There are a number of proprietary applications, for example, that provide support for SSADM. Such applications facilitate the creation of common SSADM documents such as data flow diagrams, process descriptions, entity relationship diagrams, and entity life history diagrams.

This article is an aggregation of two articles covering the topics of development methodologies and case tools respectively, both first published on the website in January 2009.

Life Cycle Models

Systems Analysis

Chris Wells, September 27, 2022

The Waterfall model

The "waterfall" model is a sequential software development model suggested by W. W. Royce in which development is seen as flowing steadily downwards (like a waterfall) through a number of phases. In most system life cycles based on Royce's original model, the sequence of phases used is similar to that shown in the diagram below.

The Waterfall life cycle model
The Waterfall life cycle model

To follow the waterfall model, you proceed from one phase to the next in a purely sequential manner. You should move to the next phase only when the current phase is complete. The phases of development are discrete, and do not allow for jumping back and forth, or any overlap between the phases, although there are modified versions of the waterfall model that are not so rigid.

The waterfall model is widely used (for example, by those involved in producing software for the US Department of Defense and NASA). The central idea behind the waterfall model is that time spent early on in a project making sure that the requirements and design phases are absolutely correct will save more time and effort later. Requirements should therefore be set in stone before design is started, and the program's design should be perfect before work begins on the implementation phase.

The model places much emphasis on documentation, and proponents of the model claim that its advantages include simplicity and structure. The model progresses in a linear fashion through discrete, logical phases, and is easy to understand. It also provides recognisable "milestones" in the development process. Perhaps for these reasons, it is often used as a first example of a development model in software engineering texts. It is argued that the waterfall model is suited to projects that are stable (i.e. whose requirements are unlikely to change significantly) and where the system designers are able to evaluate potential problems before implementation begins.

Critics of the waterfall model argue that it is impossible, for any non-trivial project, to get one phase of the lifecycle perfected before moving on to the next phase. For example, clients may not be aware of their exact requirements until they see a working prototype, or they may change their requirements constantly, necessitating frequent changes in design. Furthermore, problems encountered during the implementation phase may be better solved by revisiting the design phase than by trying to work around the problems using the original design. The idea behind the waterfall model is "measure twice, cut once". This idea tends to fall apart, however, when what is being measured is constantly changing due to changes in requirements, or the emergence of major unforeseen problems. The counter-argument is that experienced designers may have worked on similar systems already, and can thus anticipate many of the problems that might arise.

Many modified waterfall models have emerged which address some of the criticisms of the original model. Royce's own final model allowed for the results of the testing phase to be fed back into the design phase, and problems from the design stage phase to be fed back into the requirements specification phase. Overlapping the stages makes this kind of feedback possible, but also makes it difficult to determine exactly when a particular phase is complete.

The Incremental Model

The Incremental life cycle model
The Incremental life cycle model

The incremental model allows different parts of the system to be developed separately, each with its own life cycle iteration. Each iteration includes a requirements analysis, design, implementation and testing phase. The first iteration usually produces a working version of the system on which subsequent iterations build. The incremental model provides a more flexible and less costly alternative to the waterfall model in which changes to the scope or requirements of the project can be more easily accomodated. Each iteration can be completed relatively quickly, and is easier to manage. Like the waterfall model, however, each phase of each iteration must be completed before the next phase can commence, and the overall system architecture may be poorly defined in the early stages of the project.

The V-shaped Model

The V-shaped life cycle model
The V-shaped life cycle model

Like the waterfall model, each phase of the V-Shaped life cycle must be completed before the next phase begins. In this model, however, there is much more emphasis on testing. Test procedures are developed early in the life cycle. System testing is designed to assess whether the requirements determined during the requirements analysis phase have been met. Integration testing is developed during the high-level design phase, which focuses on the overall system architecture, and unit testing is planned to assess the success of the low-level design phase, which concentrates on individual system components. Life cycle execution proceeds down the left-hand side of the V, and at the same time the test plan for each phase is developed. Once the implementation phase has been completed, execution proceeds up the right hand side of the V, and the test plan for each phase is completed in turn.

Like the waterfall model, the V-shaped model is easily understood and has well defined phases with specific deliverables. The chances of success are probably higher than for the waterfall model due to the development of test plans early in the cycle, especially where projects are relatively small and the requirements are well understood. This model lacks the flexibility of the incremental model however, with no intermediate prototypes produced. The model does not specify what course of action should be taken if significant problems emerge during the testing phases.

The Spiral Model

The spiral life cycle model
The spiral life cycle model

The spiral model is similar in many ways to the incremental model, with more emphasis on risk analysis. The project passes repeatedly through the four phases of planning, risk analysis, development and evaluation. System requirements are gathered during the planning phase and alternative solutions are generated. The risk analysis phase identifies the risk associated with each alternative solution (for example, the likelihood of a cost overrun or prohibitive operating costs). The development phase results in a prototype, which is then thoroughly evaluated in the evaluation stage before the next iteration of the spiral is undertaken. The cycle continues its iterations until the evaluation phase determines that the system requirements have been met in full (or until an unacceptable level of risk forces the project to be terminated).

The cost of the project increases as the spiral goes through successive iterations. The high degree of risk analysis and the production of a prototype system early in the life cycle makes this a good option for large and complex projects, although it can be potentially costly, and the level of expertise required for the critical risk analysis phase means that it is not really suited to smaller projects.

This article is an aggregation of two articles on the subject of life cycle models first published on the website in January 2009.

The Systems Life Cycle

Systems Analysis

Chris Wells, September 27, 2022

The term "system life cycle" can be applied to many kinds of endeavour, but in the context in which we are interested (information systems), it is used to describe the process by which an existing information system is replaced with another system. The idea of a 'life cycle' implies that the process has a beginning, an ending, and something going on in between.

Any new information system is carefully designed to fulfill a specific set of requirements, which are themselves derived only after a thorough investigation of the organisation's business needs. If correctly designed and implemented, the system should enable users of the system to be more productive, and management should see their business running more efficiently and profitably. Provided the system is adequately maintained and supervised, this situation may persist for a number of years, with the system doing everything that it was designed to do.

As time passes, however, many things will change. Technology evolves constantly. Personnel move on and are replaced. The business interests of the company may evolve and grow. Social, economic and environmental conditions change. The system that initially worked so well will begin to show its age. As its ability to adapt to new business requirements and changing conditions is taxed more and more, users will begin to become dissatisfied with system performance, and business efficiency will start to suffer. At this point, the life cycle of the current system is about to end, and that of a new system will begin.

Change, however, involves risk, especially in the field of IT. An alarming number of new IT systems fail to perform as specified, many are late, and many are over budget. A significant proportion of new systems fail to meet the requirements laid down for them, or are abandoned before they are complete. Many businesses neglect to properly train their staff in the use of new information systems, and many simply do not understand the capabilities and limitations of the technology. Currently, less than a fifth of all IT projects can claim to meet all of their objectives. In an effort to reduce the risk of failure, the systems life cycle breaks a project down into a series of well-defined stages, from the formulation of the initial concept through to the implementation and maintenance of a fully working system.

Life Cycle Phases

There are many lifecycle models in use. The number of phases, and the activities that take place within each phase, vary from one model to another. The generic phases outlined below, however, will appear in some form in most, if not all, of these models:

  1. Requirements analysis - in this stage, the scope of the system is defined, and a set of requirements is derived (i.e. the functionality and levels of service that the new system must provide). The requirements are then formally documented in a requirements specification, which forms the input to the design phase. The activities that take place in the analysis phase may include an initial feasibility study to determine whether the project justifies the resources required to complete it. The existing system will then be thoroughly investigated to determine what business processes are involved, the nature and volume of the data used, and the issues that need to be addressed by the new system.
  2. Design - a system will be designed to meet the requirements specified as the result of the previous phase. The output of this phase will be a blueprint that is used by systems and software engineers to implement the hardware and software components for the new system. Some of the issues dealt with here include database design, the design of data entry screens, detailed software specifications, and system architecture.
  3. Implementation - in this phase the system is actually built. Hardware is installed, database and software applications are written, and data is loaded into the system.
  4. Testing - once the system is fully operational, it should be thoroughly tested before it is handed over to the client to ensure that all of the specified requirements have been met, and that the system is functioning correctly under operational conditions. Testing of individual system components (unit testing) is normally carried out throughout the implementation stage in order to correct any minor problems that may arise before they have a chance to escalate into major problems.
  5. Maintenance - once the client has accepted the new system and it is in daily use, there will inevitably be adjustments to be made to the system throughout its useful life. Hardware elements will need to be replaced or upgraded, and the need for additional functionality or information processing features may necessitate modifications to software or database programs. Eventually, changes in the business environment in which the system must operate will create the need for a new system, and the cycle will begin again.

This article is an aggregation of two articles on the subject of the systems life cycle first published on the website in January 2009.

Change and Decay . . .

Systems Analysis

Chris Wells, September 27, 2022

At the beginning of the 21st Century, it is reasonable to assume that most organisations of any size will employ information technology to a greater or lesser degree. This implies that they have made a considerable investment both in developing the information system itself, and in training their employees to operate and maintain the system. The users are familiar with its operation, the IT staff are familiar with the technology, and management have a good grasp of the capabilities of the system.

The implementation of a new system, or even a major upgrade to an existing system, is likely to be expensive and fraught with risk. Hardware and software may need to be replaced or upgraded, staff may need to be re-trained, and the network cabling may have to be ripped out and replaced. There are, of course, a number of reasons why organisations are prepared to undertake the expense and face the risks of introducing a new system.

The system may no longer be fit for its intended purpose. This may occur for a number of reasons. The work of the organisation may have changed or evolved over a period of time to such an extent that the existing system is no longer ideally suited to current work practices. Growth, too, is an inevitable and desirable aspect of any successful enterprise, and invariably involves an increased number of staff, a higher volume of working data, and a greater number of business transactions in a given period of time.

The organisation's business may have grown to such an extent that the system can no longer cope with the workload. External factors can also put additional pressure on the system. New government legislation, for example, may require that more stringent security measures are implemented for certain types of data, or require changes to the type of information that is held on the system, or increase the period for which certain types of financial information must be held.

Information technology is probably one of the fastest changing fields of human endeavour. Computer hardware and software, network technology, and global communications systems are constantly evolving in terms of both the speed at which they operate and the functionality offered. If a competitor is using a more advanced system that provides greater efficiency, reduced cost, and enhanced capabilities, they are clearly at an advantage.

Customers, too, often provide the impetus for change by demanding that the systems used by their suppliers are upgraded to facilitate a timely and efficient exchange of data. Many software vendors will only support a particular version of a software package for a limited period of time. This, together with the fact that older software packages are often unable to take full advantage of advances in hardware, means that the system software must at some point be updated.

This article was first published on the website in January 2009 as an introduction to the Systems Analysis section of that website.