Last In - First Out

Building Non-Functional Requirements Framework - Requirements Categories

I'm planning on documenting a framework that we built for managing non-functional requirements. This is post #2 of the series.

In Post #1, Last In - First Out: Building a Non-Functional Requirements Framework - Overview I outlined the template and definitions for our Non-Functional Requirements.

We also had to address outstanding audit findings that pointed out the lack of enterprise-wide security standards. Blank templates weren't going to cut it. The next steps were to create a generic set of Non-Functional requirements within each category, applicable to any system that we'd likely encounter. We then followed up with a structured, objective framework for applying the requirements to a particular system. The next few posts will cover these topics.

To make the NFR's re-usable and applicable to as many systems as possible, we created multiple Metrics within each NFR. Systems for which requirements could be relatively simple would be required to meet a lower Metrics, while systems for which requirements needed to be higher/stricter would meet the higher Metrics in the NFR. The Metrics were designed so that the very lowest level would be applicable to a single personal computing device with no stored confidential data, the highest Metric would be applicable to our largest system with the most confidential or financial data, and the in-between Metrics would be applicable to systems of varying levels of security and availability requirements in between the extremes. This allowed us to create a single Requirement applicable to many (or any) systems proportional to their relative value, and without subjecting low value systems to rigorous requirements.

Note that Availability, Performance and Reliability requirements found in other models are not requirements categories in our model. We determined that if a system met a set of Resiliency, Recoverability and Security requirements, the system would also meet an appropriate level of availability and reliability as a byproduct of the Resiliency, Recoverability and Security Requirements. Likewise, the system would be able to meet Performance requirements as a byproduct of scalability and maintainability requirements.

Usability, Portability and Compatibility are common requirement families in other models, but as the model was driven by short-term infrastructure and security needs, they were left out in the early phases

Keep in mind that these categories and requirements were designed to be usable in our environment - a public College and University system.

The categories and a high level description of the requirements in each category follow:

Category: Resiliency

Resiliency requirements describe the ability of the system to continue to function during common failure modes. A resilient system continues to work after routine failures (disk, server, OS or process). Resiliency is necessary to meet availability requirements and usability requirements. A resilient system may use technologies such as redundancy, clustering, load balancing, error handling, and error recovery to function after component failure. Resiliency encompasses the concepts of availability, reliability, robustness, fault tolerance and exception handling as described by other authors.

Our model references three Resiliency requirements - Hardware Resiliency, Software Resiliency, and Environmental Resiliency. Each requirement may have multiple levels with each metric.

Resiliency-Hardware Requirement: The ability of the system to continue business functionality upon physical failure of hardware components that make up the system.

Incorporates traditional concepts of Redundancy, Clustering, Load Balancing and Fault Tolerance. A systems 'Availability', RPO and RTO are derived from this and other requirements.

This requirement is intended to force the designer to leverage high availability technologies for systems in which the impact of an unavailable system reaches certain thresholds.

Resiliency-Software Requirement: The ability of the system to continue business functionality upon logical failure of software components that make up the system.

Incorporates traditional concepts of Redundancy, Clustering, Load Balancing and Fault Tolerance. A systems 'Availability', RPO and RTO are derived from this and other requirements.

In general, the designer should consider Resiliency – Software, and Resiliency – Hardware NFR’s as a unit and engineer for both NFR’s in concert. In particular, the software must be designed so as to gracefully manage both software and hardware failures using robust transaction management and error handling. Failure modes and failure domains must be well understood.

Resiliency - Environmental Requirement: They ability of systems to continue business functionality upon physical failure of site environmentals, including power, cooling, and related components.

Incorporates redundant power, cooling, uninterruptable power, generator backup. A systems 'Availability', RPO and RTO are derived from this and other requirements.

This NFR specifies that the facilities-related components that support the system have the appropriate level of recoverability and resiliency.

Designers should engineer for routine power and cooling failures and have appropriate back up power, alternate cooling, as necessary. Facilities failure domains such as power supplies, power distribution units, air conditioning units, etc. should be considered.

Category: Recoverability

Recoverability requirements that describe the ability to recover from failed states and return the system to its as-built condition. Using the example of a failed unit of hardware, a resilient system will continue to function after failure, while a recoverable system will have a simple and predictable method for recovering from the hardware failure. Data backups, data replication, hot-swap hard drives, and automated operating system and application deployment tools may be technologies or techniques to recover a failed component.

Our model references four Recoverability requirements: Component Recovery, Site Recoverability, Configuration Recovery and Logical Recovery. Each requirement may have multiple levels with each metric.

Recoverability-Component Requirement: The ability to repair or replace system components predictably, with minimum work effort, and with no loss or disruption of business functionality.

Incorporates traditional concepts of Configuration Management and Maintainability. Assures that components can be brought on line without maintenance windows.

While the resiliency NFR’s cover the behavior of systems when components fail, the recoverability NFR’s assure that the design of systems includes the ability to restore the system to its original, pre-failure state in a predictable manner.

To assure component recoverability, the designer needs to assure that the configuration of all system components is known, and that a means exists to create new components that are identical to existing components.

Recoverability-Site Requirement: The ability of the system to resume business functionality upon physical or logical failure of the site housing components of the system.

Incorporates traditional concepts of Disaster Recovery, site failover, site replication, off-site backups. A systems 'Availability', RPO and RTO are derived from this and other requirements.

This NFR sets the minimum Recovery Point Objective (RPO) and Recovery Time Objective (RTO) that systems must meet under site related failures, such as data centers, buildings and campuses.

Recoverability - Configuration Requirement: The ability of the system to resume business functionality upon logical failure of system metadata or system configuration information.

Incorporates traditional concept of change management (portions of), configuration management, test and back-out plans for planned configuration changes.

The intent of this NFR is to provide assurance that the system is designed and managed such that if any portion of the configuration of the system is modified for any reason, intentionally or not, the system can be recovered back to the state that it was in pre-modification. This is intended to discourage systems in which the configuration is ad-hoc, unstructured, or 'mouse driven', as compared to template or script driven configurations.

Recoverability - Logical Requirement: The ability of the system to resume business functionality upon logical failure of application managed business data.

Incorporates traditional concepts of database 'point in time recovery', file system snapshots and daily backups. A systems RPO is derived from this and other requirements.

This NFR is intended to assure that the system is designed so that after the data in a system has been modified outside of normal business practices (I.E logical file system or database corruption, poor configuration management, unauthorized data modification by either internal or external entities) the data managed by the systems can be recovered to a state at a point in time prior to the modification.

Category: Scalability:

Our model has a single Scalability Requirement. The requirement may have multiple levels with each metric.

Scalability requirements describe the ability to add and remove capacity to the system without affecting the availability of the system, while maximizing maintainability and constraining costs.

Scalability - Component Requirement: The ability to dynamically and cost effectively add or remove capacity by adding or removing hardware or software components.

Incorporates the traditional concept of 'Horizontal Scalability', load balancing and dynamic capacity management. Assures that systems are compatible with cloud technologies.

The intent of this NFR is to force systems into a horizontally scalable architecture, and to limit or prohibit designs that depend on large-scale hardware upgrades to scale to additional capacity. I.E systems must be designed to scale out, not scale up.

Category: Maintainability:

Our model has a single Maintainability Requirement. The requirement may have multiple levels with each metric.

Maintainability requirements describe the ability to maintain the system over its operational life. Among other attributes, a maintainable system can have routine hardware upgrades and application deployments without user affecting outages, it will have monitoring, logging and auditing sufficient for routine troubleshooting, it will have a low operational cost. Maintainability encompasses manageability, upgradability, deployability and flexibility as described by other authors.

Maintainability-Component Requirement: The ability to maintain the hardware, software and environmental components of a system without disrupting business functionality, and with minimal or no planned system outages.

Incorporates traditional concepts of Service Management, Change Management (portions of), Maintenance Windows and Continuous Maintenance. Assures that effect of system maintenance on users will be minimized.

This requirement forces the designer to consider the maintainability of the system as a part of the design process. The designer should select and configure components such that:

Routine maintenance can be conducted on-line, using common technologies such as load balancing and clustering or equivalent.
Application patches and upgrades can be implemented on-line.
The release of new application functionality, including database schema changes, can be done on-line in many or most cases.

Category: Security:

The ability to maintain the confidentiality and integrity of a system and the data contain in or controlled by the system. Requirements related to system access, system integrity, system confidentiality and system configuration.

Our model references five Security Requirements - Configuration Integrity, Configuration Assessment, Data Classification, Data Encryption, and Awareness and Training.

Security - Configuration Integrity Requirement: The ability to determine the source of modifications to the logical and physical configuration of a system. Logging and auditing of configuration information and changes. The ability to prevent or detect unauthorized changes to configuration or data. The ability to respond to unauthorized access or modification of system configuration or data. The ability to determine the configuration of a system at an arbitrary point in time in the past.

Incorporates the traditional concepts of Configuration Management, Change Management (portions of), security auditing, Business Activity Logging, Intrusion Detection/Prevention and Malware Detection/Prevention, and security incident handling.

The intent of this requirement is to ensure that the system is designed so that:

The system can support/enable least privilege and role based system configuration.
Configuration changes are detectable. This implies that technologies such as routine, scheduled, continuous, or near-continuous configuration auditing.
Auditing of changes in configuration creates an immutable audit trail, and the audit trail is properly secured.
The configuration of a system can be recovered back to the state that the system was in prior to the modification.

Security - Configuration Assessment Requirement: The assurance that the initial configuration of the system is appropriately secure, that the system configuration is maintained in an appropriately secure state over the life of the system and that the state is verified and tested.

Incorporates the traditional concepts of system hardening, code review, Vulnerability Management, Pen Tests, Patch Management and least privilege for access and modification of system configuration.

The intent of this requirement is to ensure that systems are initially configured to a secure state, and that they remain in that state over the life of the system.

The initial condition of the system is ‘hardened’ consistent with this requirement.
A process or method must be implemented to ensure that the system is maintained in that state over its lifetime.
The condition of the system is verified periodically, depending on the Level within the requirement, for example by using vulnerability scans of systems and application code.
The application code is written and tested in accordance with a formal software development practice.
Technologies, tools frameworks and libraries are implemented in a consistently secure manner.

Security - Data Classification Requirement: The classification of data consistent with State and Federal regulations and the assignment of data ownership.

Security - Data Encryption Requirement: The conditions under which data must be transported, transmitted and stored in an unreadable, encrypted format.

Incorporates the traditional concepts of protecting data using encryption such that the data is only readable by authorized individuals.

The intent of this requirement is to ensure transport layer security is implemented for data that is transmitted over a less trusted network, and that encryption is implemented for data at rest. Encryption of data at rest may include full disk encryption, database encryption, and/or encryption of backup media.

Security - Data Access Requirement: The ability to limit logical and physical access to systems and data to authorized individuals, the ability to limit modification of systems and data to authorized individuals, the logging and auditing system and data access, and the ability to alert on unauthorized access.

Includes traditional concepts such as account provisioning and management, account credentials, authorization, least privileged based data access, business activity logging and audit logging, security perimeters and perimeter controls.

The intent of this requirement is to limit access to data based on need-to-know to perform job duties and to alert on inappropriate access, and/or have an audit trail of access or activities (i.e. read, write, modify, delete) that can be traced to an individual.

Security - Awareness and Training Requirement: The assurance that system administrators are adequately skilled and knowledgeable in information security and the implementation, management and maintenance of systems for which they are responsible.

The intent of this requirement is to ensure system administrative personnel have the skills, knowledge and/or experience to effectively implement requirements defined by Federal or State law, regulations, contractual agreements, Policies, Procedures or other non-functional requirements.

Checkpoint:

I've described templates, categories and a high level view of our Non-Functional Requirements. Next up - a series of posts describing each requirement, followed by a framework for applying the NFR's to an IT system.

Building a Non-Functional Requirements Framework - Overview

I'm planning on documenting a framework that we built for managing non-functional requirements. This is post #1 of the series.

A pain point for our infrastructure and security teams was a lack of usable, consistent availability and security requirements for our internally developed applications. The business analysts worked with the organization to create requirements for the functionality of the application but ignored most of what infrastructure, identity management, and security would need until the end of the development process. By the time these teams got insight into the application it was too late to wedge in new requirements. The net was that the organization was promised applications or enhancements, but because no consideration had been made for non-functional requirements, deadlines were often missed. The worst example was the pending release of a major new application that allowed manipulation of financial information, but for which no consideration had been made for authentication, authorization requirements, or database & application hosting security. Retrofitting that project added a year to the timeline.

Additionally, we had a series of outstanding audit findings related to the lack of enterprise-wide standards for securing systems. We tended to build secure and available systems because we knew what we were doing - not because we built to an objective, measurable standard. Auditors would prefer that we built to a standard that ensured a secure, available system - and of course we agreed.

When I had a few months of down time (approx. 2012-2013) I decided to see what the state of art was in creating and maintaining non-functional requirements (NFR's). I looked at the obvious - FURPS+, ISO-9126, ISO-25010 and a handful of University published research papers. My biggest issue with the various existing models was that they were software specific. I felt that NFR's should apply to entire systems, not just the software running on the system.

As far as I could tell at the time, the various sources, authors, consultants and Gartner didn't really agree on much other than that NFR's are not Functional Requirement's and that you need to have some. I found that:

Many web sites have lists and examples of NFR's.
Some try to define NFR's, few succeed.
Others admit that NFR's are difficult to gather.
Few apply NFR’s to systems (vs. software)
FURPS+, ISO-9126, ISO-25010 and similar didn't treat security as a first-class citizen, nor did they address legal requirements.

What I did find though, were a couple of sources that I thought I could use to build a set of generic non-functional requirements.

Erik Simmons and John Terzakis (Intel) each have a fair bit of good information in various presentations that are readily searchable.
Tom Gilb's 'Planguage' seemed like a valuable tool, and both Simmons and Terzakis describe how to use Planguage for requirements writing.

See:

Specifying Effective Non-Functional Requirements, John Terzakis Intel Corporation June 24, 2012 ICCGI Conference Venice, Italy
21st Century Requirements Engineering: A Pragmatic Guide to Best Practices, Erik Simmons, Intel Corporation

These sources were close to being adaptable, but rather than try to adopt an existing framework as-is, I thought that it'd be best for us to come up with something usable by borrowing from various existing sources, primarily borrowing bits and pieces from Simmons, Terzakis, and Gilb.

Into the Non-Functional Requirement Abyss

We agreed that Requirements are not designs and should not specify a particular technology or configuration. Requirements should specify an end result, not the path to achieve that result. We tried to keep this in mind as we worked out our framework.

Our starting point (and first disagreement…) was on the definition of non-functional requirements. Here's what we used:

Functional Requirements describe the intended behavior of the system (or software), or what a system should do.
Non-functional Requirements describe how well the system does whatever it does and under what constraints the system must operate. NFR's describe operational characteristics, performance, availability, etc.

We decided to leverage a permutation of the common 'S.M.A.R.T' framework as a requirement for writing the requirements. By placing bounds on the requirements writing process, we hoped that we'd end up with requirements that would have a chance of being valuable to the organization.

S.M.A.R.T.

Our version of 'S.M.A.R.T':

Specific: Requirements will be clear, concise, unambiguous, with consistent terminology, and with detail sufficient such that designs based on the requirements will meet operational goals.

Measurable: A test can be devised that verifies the requirement using a bounded measurement.

Attainable: The requirement is technically feasible within the constraints of current technology, and for which there is at least one design and implementation.

Realizable: The requirement is fiscally and manageably implementable within the constraints of organizational budget and staffing.

Unambiguous: The requirement will have a single, non-conflicting interpretation.

Traceable: The source of a requirement will be traceable to stakeholder need. The requirement is traceable to business strategy or roadmap. The life cycle of the requirement is traceable from its conception to its current state.

Specificity and Measurability were considered important because we hoped it would keep us from writing vague requirements or requirements for which there were no means of measuring attainment.

Attainability and Realizability were intended to prevent the implementation of requirements for which there was no solution possible, or no solution that was actually implementable in our environment with our limited capabilities.

Traceability was desired to prevent the imposition of requirements for which there was no business need (requirements for the sake of requirements, or requirements to give us an excuse to buy shiny new resume-building technology) or requirements that appeared out of nowhere or were modified outside of a formal process.

Requirement Categories

Be cause we like putting things in neat buckets, we created broad categories of NFR's for which we thought we'd have an immediate need. The various industry models have categories (Maintainability, Reliability, Portability, etc.) but our thinking at the time was that those categories didn't work for us. So we started from scratch and ended up with the following:

Resiliency - The requirements that describe the ability of the system to continue to function during common failure modes. A resilient system continues to work after routine failures (disk, server, OS or process). Resiliency is necessary to meet availability requirements and usability requirements. A resilient system may use technologies such as redundancy, clustering, load balancing, error handling, and error recovery to function after component failure. Resiliency encompasses the concepts of availability, reliability, robustness, fault tolerance and exception handling as described by other authors.

Recoverability - The requirements that describe the ability to recover from failed states and return the system to its as-built condition. Using the example of a failed unit of hardware, a resilient system will continue to function after failure, a recoverable system will have a simple and predictable method for recovering from the hardware failure. Data backups, data replication, hot-swap hard drives, and automated operating system and application deployment tools may be technologies or techniques to recover a failed component.

Maintainability - The requirements that describe the ability to maintain the system over its operational life. Among other attributes, a maintainable system can have routine hardware upgrades and application deployments without user affecting outages, it will have monitoring, logging and auditing sufficient for routine troubleshooting, it will have a low operational cost. Maintainability encompasses manageability, upgradability, deployability and flexibility as describe by other authors.

Scalability -The requirements that describe the ability to add and remove capacity to the system without affecting the availability to the system, while maximizing maintainability and constraining costs.

Security - The ability to maintain the confidentiality and integrity of a system and the data contained in or controlled by the system. Requirements related to system access, system integrity, system confidentiality and system configuration.

These can be mapped back into FURPS+, ISO-9126 & ISO-25010 and ISO-27002, NIST 800-53, etc.

Note that Availability, Performance, Reliability are not requirements categories in our model. We determined that if a system met a set of Resiliency, Recoverability and Security requirements, the system would also meet an appropriate level of availability and reliability as a byproduct of the Resiliency, Recoverability and Security Requirements. Likewise, the system would be able to meet Performance requirements as a byproduct of scalability and maintainability requirements.

Non-Functional Requirements Form & Format

Following the work done by Simmons & Terzakis (Intel) we decided to implement a modified template and Planguage-like structured language for the NFR's. Each NFR exists as a single document.

The Non-functional requirements template and definitions that we settled on are:

Category: A text field representing the category that the requirement is classified under in the Minnesota State Model. The Category and Context are equivalent to the 'ID:' in Planguage or 'Ambition' in (Simmons/Intel 2011).

Context: A text field representing the requirement, unique within a category. The Category and Context are equivalent to the 'ID:' in Planguage or 'Ambition' in (Simmons/Intel 2011).

Goals: Natural language description of the intent of the requirement and how it supports one or more of the general goals. The Goal is equivalent to 'Gist:' in Planguage or 'Ambition' in (Simmons/Intel 2011).

Rationale: The reason that the requirement exists. Expressed in natural language.

Requirement: The requirement to which the system will be held, expressed in constrained natural language. Requirement will be written in a constrained natural language meeting Minnesota State Non-Functional Requirements Attributes.

Metric: Measurement used to determine if requirement has been met and the process or device used to locate the measurement on the scale. Metric must include 'Minimum', the minimum acceptable measurement, and may include 'Target', the measurement to which the system must be designed.

Scale: The scale of measure used to quantify the requirement.

Stakeholders: Persons who stand to gain or lose by implementation of requirements. Expressed as roles, not individuals.

Implications: Implications to the stakeholders if these requirements are not met.

Applicability: Systems or categories of systems to which requirement applies.

Status: One of Draft, Approved, Revised, or other constrained choice of statuses matching the requirements implementing process.

Author: Person responsible for authoring and maintaining requirement.

Revision: Sequential number representing approved revision of requirement.

Date: Date of last revision of requirement

The NFR's have a structure and format that could be adapted to metadata driven requirements tooling.

Checkpoint

At this stage we had a handful of Non-Functional Requirements categories and a template for writing the NFR's, but no actual requirements.

Next up: Part #2 - A high level description of each Non-Functional Requirement

Thirty-Four Years in IT - Why not Thirty-Five?

After I was sidelined (Part 10) we had another leadership turnover. This time the turnover was welcome. I ended up in a leadership position under a new CIO. This allowed me to take advantage of some topics that I studied while I was sidelined. My new team took on a couple of challenges. (1) Introducing cloud computing to the organization, and (2) attempting to add a bit of architectural discipline to the development and infrastructure teams and processes. The first was somewhat successful, the later was not.

Cloud

I had been slowly working to get a master agreement with Amazon - a long, slow process when you are a public sector agency. When our new CIO mentioned 'cloud' I did a bit of digging and found out that Microsoft had added the phrase 'and Azure' to our master licensing agreement. Microsoft's foresight saved me months of contract negotiations. They made it trivial to set up an enterprise Azure account. So Azure became our default 'cloud'.

I had been running the typical nerdville home servers. Moving them from in-house Mac's to Linux in Azure was trivial - a weekend of messing around. I affirmed to our CIO that we had a fair number of apps that could be hosted in IaaS, and picked a couple of crash-test dummy apps for early migration.

Myself and one of my staff spent a few months creating and destroying various assets in Azure, and came to the conclusion that the barriers to cloud adoption would be found mostly in our own staff, and not the technology stack. Infrastructure staff would have to re-think their jobs and their roles in the organization, and development staff would have to re-think application design. Both would challenge the organization.

I also did a few quick-and-dirty demonstrations to get some ideas on how we might architect an enterprise framework for moving to Azure - such as hiding an Azure instance behind a firewall in our test lab to show that we could create virtual data centers that appeared to be in our RFC-1918 address space, but were actually in Azure IaaS. We also presented quite a bit of what we learned to our campus IT staff at various events and get-togethers, hoping to build a bit of momentum at the campuses.

On the down side, I ran into significant barriers within our own managers and their staff. A quorum of managers and staff were cloud-adverse and/or firmly committed to technologies and vendors that had no cloud play. We had to fight FuD from within.

Architecture

The Architecture activity was not successful. We had been running 'seat-of-the-pants' for years, resulting in many ad-hoc and orphaned tools, technologies and languages, and we were thinly staffed. So the idea that by adding rigor and overhead up front we'd end up with better technology that was less work to maintain was not well accepted. The entire concept of design first, then build was a tough sell, as the norm had been to start building first and figure out the design on the fly (if at all). Modern architectures such as presenting and API to our campuses were rejected outright. And of course the idea that two development teams or two infrastructure workgroups would agree on a tool, language, library - much less an architecture, was an even tougher sell.

The team (and any semblance of a formal architecture) was disbanded through attrition, and the body of standards, guidelines, processes, and practices are no doubt still in a SharePoint site, unmaintained and unloved.

Why did I leave when I did?

As time went on, I found myself in fundamental disagreement with how the organization treated its people. Leadership was making personnel decisions that I could not support, that caused the loss of several of our best people, and that placed other staff in places where they could not succeed or by happy.

That leadership would move staff into positions in which they had no interest, and do it without the concurrance of their manager (me) was unacceptable. To pile on work that was outside the core skillset of an employee, and then try to destroy their career when they were failing, is unacceptable. I don't want to work for an organization like that, and because of financial decisions I made years ago I do not have to work for an organization like that.

I did the math, got my ducks in a row, and retired.

My only regret is that I was unable to influence the disposition of the staff that I left behind.

Previous: Part 10 Leadership Chaos, Career derailed