Alarm Rationalization & Management.

Alarm rationalization is a structured process that reviews all potential or current alarms based on set guidelines in an alarm philosophy document. Its main purpose is to determine which alarms are crucial, design them appropriately, and note details like their cause, impact, and required operator action.

The objective is to ensure each alarm brought to the operator’s attention is relevant and meaningful. A good alarm alerts the operator about unique abnormal situations and provides necessary actions to prevent undesirable outcomes. By streamlining this process, we can reduce unnecessary alarms, prioritize crucial alerts, and enhance the operator’s efficiency and effectiveness.

What is Alarm Rationalization

To develop a system with the right number of alarms essential for maintaining safety and standard operational limits. Alarm rationalization involves a team of varied plant experts who can assess and validate each alarm based on guiding principles for alarm use.

Every tag in the control system, should undergo this review. Even those without alarms are evaluated the team decides whether to keep alarm-free or suggests introducing new alarms. Skipping over tags without alarms might lead to missed opportunities in alarm optimization. For example, an existing alarm could be substituted with a new one on a different tag for clearer identification of discrepancies.

Alarm Management Procedure defines the following:-

Whilst this procedure does not apply directly to the annunciator panels or dedicated trip / shutdown systems etc., unless they act as source of Alarm which are repeated to, or annunciate solely on the automation system; many of the principles laid out in this procedure may still be relevant to  implementation / modification of such systems.

In addition, this procedure governs Documentation and Rationalization (D&R) process and serves as a long-term guide for Alarm system improvements and maintenance.

The Asset Manager or a Personnel of Equivalent Cadre of an manager from the Client’s Organization as the “Process Owner” for Alarm Management shall demonstrate in his capacity the  ongoing commitment to improve and maintain Alarm performance at acceptable levels by sponsoring Alarm management improvement activities continuously.

This includes, but is not limited to, ensuring that sufficient and competent resources are in place to drive Alarm improvements forward and sustain this performance in line with Process Safety Management framework.

The Alarm system Process Owner can delegate aspects of Alarm management, but retains accountability for overall Alarm system specification, Performance and conformance with this procedure.

In terms of management of   Alarm systems for Process Industries, an Alarm class is a “collection of Alarms with a routine set of Alarm management requirement  (e.g.  training, testing,monitoring,  and  audit  requirements)”.  A category of Alarm classification that exists is referred  as safety critical Alarms. These Alarms are specifically designated as being critical to the safety of a given process, with the aim of safeguarding human life and the environment.

There are six distinct categories of Alarms:

An Alarm will, typically, notify the operator that: –

The  operators  shall  acknowledge  the  Alarm  and  silence  the  Alarm  sounder  and  take corrective action.

The Alarm rationalization process represents a comprehensive engineering review, and the appropriate re-engineering of the Alarm configuration parameters. Rationalization is the process by which Alarms are evaluated to determine their need to exist, their priority, their activation points, their dead bands and the summary information needed for their understanding and management. In order to optimize the effectiveness of an Alarm system, all process Alarms within an operator’s area should be rationalized at the same time. The possible exceptions are Alarms required to be located in an independent Alarm system.

Representatives from other departments to be consulted as necessary. All rationalization team members shall be rotated as necessary to maximize the efficiency of the rationalization process. Rationalization begins with the “Method of Flows” to a given unit area. The Method of Flows focuses on rationalizing Alarms following the flow of P&IDs.

Deciding the most relevant priority for an Alarm requires accounting of both the consequence severity and the time within which the Operator can functionally rectify the Alarm. By combining the severity factor and the response time, the systematic approach for setting Alarm priorities is defined. The following matrix provides the guidelines for determining the priority of an Alarm. The Alarm priority assignment matrix is given below.

Rationalization is accomplished by evaluating every potential Alarm in the system on an individual basis using the P&IDs as the guide using flowchart shown below.

The first step is to determine if an Alarm is needed. If an Alarm is needed, an appropriate activation point and priority is established. The appropriate dead band is also to be determined. The Alarm activation point is determined by the amount of time required for the operator to see the Alarm, interpret the Alarm, decide on a corrective action, and implement the corrective action as calculated.

The rationalization team consists of a Facilitator, at least one knowledgeable Operator, a Process Engineer, a Process Safety Engineer and an Instrument Engineer. Not all members of the team need to work at the same time. Other stakeholders with knowledge of the process unit, its operation, hazards, and the Alarm philosophy shall participate as needed.

The following section details the necessary methodical steps required to perform a site or system prioritization exercise. Alarm rationalization ensures that all Alarms are assigned an appropriate priority, such that the operators can identify the order in which the Alarms should be addressed:

All Alarms shall be categorized based on the consequences of the hazard, if the Alarm is not acted up on and time to respond, refer to section 8.6. Any Alarms classified as HMA (Highly Managed Alarms) and shall be given the highest priority.

The following Alarm priorities are to be considered:

1. Critical

2. Priority 1 – High

3. Priority 2 – Medium 4. Priority 3 – Low

The Industrial Automation and Control Systems (IACS) shall be configured with following priority cases:

Critical / Major: Identified as part of a safety assessment, e.g. FTA or LOPA and shall include all HMAs.

The Consequence of no or incorrect process Operators response to the Alarm will be assessed using the RAM Matrix.

High Risk : Unacceptable high, this level of risk exposes to intolerable losses to People, Assets, Environment or Reputation. The hazard should be eliminated or it should be reduced to tolerable levels immediately. It is imperative to promptly implement measures aimed at reducing the level of risk.

Medium Risk: The hazard must be managed to reduce the frequency and / or the severity of the hazardous events. It is imperative to plan and procedure measures aimed at reducing risks.

Low Risk: Acceptable without requiring further action. Corrections may be applied as resourced allow.

Applied to the configured Alarm priorities, this results in the following Alarm priorities for the available systems:

Alarm response for process action

Alarms are to be set at the point where the process is about to move from normal operating conditions to an area that is still safe (i.e. not near the shutdown limit) but will require Operator intervention to bring the process back into the normal operating range.

ALARM threshold levels

Traditional analogue point Alarm thresholds should be used to warn the Operator when the process is moving outside the normal working envelope. The threshold values should be set so that they allow time for Operator intervention whilst having sufficient gap between operation and Alarm to avoid nuisance Alarms.

The appropriate priority for an Alarm point will be established by considering the severity of the probable resulting event, and the operator response time required.

The severity levels are based on the following four categories of consequences

The amount of impact on each of these four categories is captured by assigning an appropriate potential severity level 0, 1, 2, 3, 4 & 5 and the likelihood probability as below

Here the above three levels of severity will be used, with Minor as the least severe and Major as the most severe severity level.

It is desired to have a rationalization process once in every 2 years after commissioning. The rationalization activity shall be substantiated with the Alarm system performance audit. All Alarm shall be rationalized before implementation in the DCS to confirm that they meet the criteria for a good Alarm and to ensure that all relevant Alarm rationalization guidelines are included as per Appendix A.

Whilst the challenges of a bulk retrospective exercise differ from those of a new project, the principles and process to be followed are the same: all proposed Alarms are to be evaluated against the agreed definitions given above in this procedure. Any proposed Alarm which does not “Qualify” as an Alarm is to be re-categorized “No Alarm” or an “Event”.

For new projects and system modifications, Alarm rationalization is a key stage in the Alarm management lifecycle and is required to be completed before detailed DCS specification and design. The Alarm rationalization is to be a phased process, to be invoked as necessary to sustain performance at target levels, as part of Alarm system continuous improvement. Therefore, project team shall incorporate Alarm rationalization exercise within as part of all projects where any Alarms are being added/modified.

It is imperative that a methodical and structured approach is adopted in design of each Alarm, wherein all design choices are duly recorded. This shall be maintained through-out the remaining Alarm management life cycle stages. The Master Alarm database shall have the following information. Procedure shall include the following as a minimum is detailed below. Any additional field can be included based on the project requirements such as:

Alarm system Procedure shall include:

Alarm procedure for any changes to the Alarm system shall include, as a minimum.

Alarms are activated when the process value diverges from typical operational parameters, specifically in the presence of anomalous operating scenarios. Alarm Management pertains to the proficient development, execution, functioning, and upkeep of Alarms in Industrial manufacturing and process plant settings.

Standing Alarms are alarms that remain in active state for a prolonged duration. A high number of standing Alarms may indicate inefficiency in Operations and maintenance, or that the Alarm system is generating a lot of Alarms that may not necessarily need operator intervention

The fundamental characteristics of a ‘Good’ Alarm are as follows;

If an Alarm does not have a defined response and provide the operator adequate time to respond, then by above definition it should not be considered as an Alarm.

An Alarm should NOT occur:-

These principles shall apply to all Alarms regardless of Alarm initiating source or Alarm priority. It is generally accepted that there is no preferred method for identification and selection of Alarms and it is envisaged that Alarms will continue to be specified through a diverse range of techniques, among which are, but not restricted to: –

Care should be taken when selecting the Alarm set points to ensure the Process Operators have sufficient time to respond to Alarm and carry out defined operator response to prevent the undesired event from occurring.

Applicable Alarm sources include:-

ESD trips should not be considered as Alarms, when there is no defined operator response required (automatic initiation has occurred).

Going forward special considerations for ESD systems shall include the following:

Alarm deadbands are an Alarm attribute within the process control system that requires the process variable to cross the Alarm set point into normal operating range by some percentage of the range. The establishment of deadbands is commonly determined by taking into account the standard operating range of process variable, the level of measurement noise, and the nature of the process variable. The utilization of deadbands has proven to be a highly efficacious approach in mitigating the occurrence of superfluous Alarms.

For Alarms generated from analog measurements, the use of a deadband is very effective in eliminating chattering Alarms. When a deadband is applied, the Alarm is set to be in Alarm at one value, but cleared at a different level. The picture below depicts this concept:

A sample Table below provides recommendations which represent a good starting point for common process.

The state of an Alarm at any point in time can be described by one of a number of Alarm states, which include:-

The Alarm state transition diagram is shown below.

Alarm transition key

The table and figure below describe the Alarm state definition and Alarm state transition diagram with option.

alarm transition key
Acknowledged alarm process

The Alarm system shall be capable of providing the operator with a clear overview of all shelved, suppressed and out of service Alarms in the system via status lists. The status lists shall provide the following information:

At the start of each shift, the list of inhibits and overrides shall be reviewed to ensure that no inhibits and overrides are inadvertently left on the system for longer than necessary. Assets should develop a guideline(s) to manage the shelving and suppression of Alarms and include discussion on list of inhibits and overrides during each shift handover.

The following are Alarm handling Techniques :-

Manual suppression (Alarm Shelving), defined as preventing indication of the Alarm to the operator when the base Alarm condition is present, is a useful function for helping to ensure that Alarms are only presented to operators if they are deemed relevant.

Automatic Suppression (Suppression by Design), an effective designed suppression scheme includes a state (event) detection algorithm and a suppression rule set. If the conditions defining the state detection are satisfied, then the state suppression rules are applied to determine which Alarms should be suppressed from the operator. The algorithm for state detection has the potential to employ a combination of inclusive criteria (i.e. AND conditions) and decision-makers (i.e. OR conditions). The nomenclature commonly employed to refer a collection of rules designed to inhibit Alarms is an Alarm suppression group. 

The technique of state-based suppression, also referred as static Alarm suppression, involves the suppression of Alarms that remain active at all times and lack significance when a process area, unit, or equipment is operating in a specific mode. The implementation of this approach has demonstrated efficacy in mitigating the occurrence of false Alarms that have lost their relevance.

The process of Alarm flood suppression, which is also referred as dynamic Alarm suppression, involves the real-time management of predetermined sets of Alarms that are triggered by specific equipment states and events. The application of a methodology to inhibit Alarms subsequent to an occurrence (e.g. the failure of a distillation column) in cases where they lack relevance or significance to the operator.

It is an action that prevents events from being reported to the system. Although any Alarms will continue to be triggered as usual, masking them prevents any of the selected Alarms from being treated like real emergencies, meaning that alerts won’t be sent out and no other steps needed to be taken.

Alarms that have the same Operator response can be grouped using an OR function and present the Operators with a single grouped Alarm, e.g. 2oo3, the Alarm would only annunciate when two out of three Alarms are active.

The inclusion of this level in the Alarm system can effectively tackle scenarios wherein multiple Alarm points are produced from a single process measurement. An instance of this scenario could be the measurement of the level of a container. When the liquid level in a container reaches high-high level, it can be inferred that high level has been surpassed. Thus, in this scenario, it is imperative to devise an Alarm system that effectively conceals the elevated level through  activation of high-high level Alarm.

The Alarm system usually supports advanced Alarm techniques, such as Model based Reasoning, Pattern recognition, knowledge-based reasoning, Neural nets and Fuzzy logic. The Operator user interface and its components shall be designed in line with Functional Design Specification Human Machine Interface, to support and augment Alarms by providing the console Operators with good situational awareness and response capabilities. Several Human factors must be taken into account when designing Operator interfaces that are effective.

The Alarm overview /summary display provides a multi-page display of existing and/or unacknowledged Alarms. Alarm Overview / Summary List shall be organized in different pages:

The Alarm Summary list shall contain the following for each Alarm:

Alarms are displayed to the process operators using a variety of displays and associated features. These include:-

Details on an individual Alarm can be accessed by opening Alarm faceplate from  Alarm display and accessing the Alarm Tab.

An Alarm message shall be generated for each Alarm, providing further information on Alarm, beyond priority and status. The Alarm message shall be displayed in the Alarm summary. Alarm messages shall not be displayed in process graphic. The text should be meaningful and unambiguous and shall follow these rules:

HMI or Operator station serves as a primary interface to Operators and various users and provides Alarms and events information in real time. The HMI or Operator stations consist of the following display panes:

The HMI or Operator stations should have the capability to selectively display particular Alarms and events of users’ interest using filters, shelves and so forth. However, the range of process Alarms and events information displayed shall be solely dependent on configuration of the standard User Security function. The configuration of HMI and Operator Station should not alter the range of process Alarms and events information displayed, unless a specific requirement arises

Audible indication to be used to inform the operator of a particular Alarm in the control system. Different Alarm tones shall be configured in DCS as shown in the below Sample table

The Installation, Testing and validation of the Alarm system shall follow the Functional Design Specification and test procedures of approved during the project phase. Implementation stage is before operation stage and therefore it is essential that the necessary procedure for Alarm system is available to the operator reflecting complete and correct information.

Projects with the potential to impact the Alarm system shall consider the following:-

Alarm System Operation is the Alarm management lifecycle following implementation and returning from maintenance. In the operation stage, the Alarm system is active and it performs its intended function. The appropriate tools can be utilized for Alarm handing within the operating state.

Alarm shelving is an important and advanced capability to help operators manage nuisance Alarms. An operator-triggered mechanism designed to temporarily inhibit an Alarm. The purpose of this feature is to provide a means for operators to temporarily conceal Alarms that they perceive as extraneous or disruptive, thereby facilitating their ability to concentrate on Alarms that demand their immediate attention. Moreover, it is a prevalent factor that contributes to incidents related to Alarm management.

Project requirements for Alarm shelving include:

Basic activities that promote and maintain good Alarm management throughout the life of the Alarm system may include:

The requirements and recommendations for Alarm system performance monitoring and associated metrics includes:-

At the time of design, it will probably be difficult to predict what the Alarm occurrence rate will be in practice. As a guide it is suggested that during design, Alarms shall be configured in the approximate ratio as shown in the Sample Table. Performance should then be reviewed during commissioning and early operation, and priorities should be adjusted to achieve the performance similar to that shown in the Sample Table below.

It is strongly emphasized that the number in the above Table should be taken as approximate indicators of effective discrimination between priorities rather than exact targets. In particular, the priority distribution is expected to be dependent on the type of plant and the speed of response required. On plants with fast dynamic responses, there are likely to be a higher proportion of higher priority Alarms.

To ensure that the Alarm system and the plant continue to function as an entity, the Instrument maintenance team to periodically review the number of chattering, consequential, duplicate, repeating, stale and standing Alarms and attempt to eliminate them. In addition, the number of Alarm floods, time to clear each Alarm and number of tags in calibration mode to be analyzed to assess potential problems with the plant.

To ensure that the Alarm system continues to meet the targets described in this procedure, the following metrics to be used as reference to assess Alarm system performance on a regular basis. An example of an Alarm summery sheet is given in the below table.

A five-level model has been devised which can be used to define an appropriate target for new systems or as a way of measuring where a system currently stands and where it is seeking to move to.

The five levels of Alarm system performance range from ‘Overloaded’ at the bottom end of the scale, through ‘Reactive’, ‘Stable’ and ‘Robust’ to ‘Predictive’ as the highest level of performance. This phenomenon is depicted in the figure presented below.

Alarms and the Alarm system are a key layer of protection. Hence, any changes shall be managed to ensure that Alarms operate effectively when needed. To avoid inadvertent changes to Alarms, proper engineering study and Management of Change requirements shall be followed rigorously.

While change is a necessary and desirable feature of operations as process and equipment are adapted to meet operational demands and constraints, the potential risks associated with modifying Alarm settings or removal of Alarms, either permanently or temporary and need to be identified and managed.

Since the Alarm systems are part of the Plant’s defence against hazards, any changes resulting from Alarm reviews needs to be carried out in a responsible way. Thus, all proposed changes should be fully analyzed, their consequences should be determined, and agreed changes should be recorded with reasons. These changes has to be maintained by Client’s Instrument Maintenance Team.

Specific Alarm system changes that should undergo MOC process may include, but are not limited to, the following:

During MOC process, the respective Instrument Maintenance Team shall be the Alarm System Owner. The Production Supervisor Engineer shall be the Alarm system champion for the task. They will have authority over all changes made to the Alarm system database and to the Alarm system procedure, including but not limited to items 1 through item 5 described above. If any modifications are done in Alarm system, the verification of changes to be done by Operations & Instrument maintenance team. Any testing of F&G Alarms shall be carried out every 6 Months time period. Online MOC system should be used to manage all changes.

It is recommended to have a periodic Audit in order to maintain the integrity of the Alarm system and Alarm Management Process. The primary purpose of audit is to reveal the gaps not apparent from monitoring. The Alarm system and associated Alarm Management processes are assessed against Alarm Management Guidelines and Alarm Philosophy procedure. It helps to identify any requirement for system improvements to Alarm philosophy or the work process defined. Audit also includes validation of Alarm management practices against latest Alarm management industry guidelines.

The routine and periodic process for Nuisance and Standing Alarm review is fundamental to sustained “Improving” performance level. The Alarm Managers (Operation/Plant) are responsible for ensuring that every month, the “Bad Actors” and oldest standing Alarms are analyzed to identify root cause issues and appropriate actions assigned and tracked through completion, so as to ensure that they do not continue to devalue the Alarm system as well as degrading system performance.

Alarm rationalization & management