DOTS: Diversity with Off-The-Shelf Components

Navigation
- Overview -
- Project Aims -
- Bibliography -



Technical Results
- Key Advances -
- Probabilistic Modelling -
- Information Modelling -
- Architecture -



Links
- CSR home -

...Bibliography

Reports and papers from the DOTS project are listed below.

Bibliographies related to previous research on diversity, and on dependability are also available.

[1] T. Anderson, M. Feng, S. Riddle, and A. Romanovsky. Error Recovery for a Boiler System with OTS PID Controller. Technical Report ECOOP '03 Workshop on Exception Handling in Object-Oriented Systems, TR 03-028, Dept of Computer Science, Univ. of Minnesota, USA, 2003.
[ bib | .ps | .pdf ]

We have previously presented initial results of a case study which illustrated an approach to engineering protective wrappers as a means of detecting errors or unwanted behaviour in systems employing an OTS (Off-The-Shelf) item. The case study used a Simulink model of a steam boiler system together with an OTS PID (Proportional, Integral and Derivative) controller. The protective wrappers are developed for the model of the system in such a way that they allow detection and tolerance of typical errors caused by unavailability of signals, violations of range limitations, and oscillations. In this paper we extend the case study to demonstrate how forward error recovery based on exception handling can be systematically incorporated at the level of the protective wrappers.
[2] T. Anderson, M. Feng, S. Riddle, and A. Romanovsky. Investigative Case Study: Protective Wrapping of OTS items in Simulated Environments. Technical Report CS-TR 821, School of Computing Science, Univ. of Newcastle, UK, 2003.
[ bib | .pdf ]

This practical experience report summarises the lessons learned during investigation of a case study which focused on engineering protective wrappers as a means of detecting and tolerating errors or undesirable behaviour in systems employing OTS components. We developed a protective wrapper capable of dealing with typical errors caused by unavailability of signals, violations of range limitations, and oscillations. The work was carried out in a simulation environment using a Simulink model of an industrial steam boiler system together with an OTS PID (Proportional, Integral and Derivative) controller. The lessons learned from the development of, and experimentation with, our case study are categorised as: those relating specifically to the use of Simulink for system modelling; those that concern the use of simulation more generally, as a means of analysing design options; and those that inform the development of protective wrappers.
[3] T. Anderson, M. Feng, S. Riddle, and A. Romanovsky. Protective Wrapper Development: A Case Study, volume 2580 (2nd Int. Conf. on COTS-Based Software Systems, Ottawa, Canada - ICCBSS '03) of LNCS, pages 1-14. Springer, 2003.
[ bib | .pdf ]

We have recently proposed a general approach to engineering protective wrappers as a means of detecting errors or unwanted behaviour in systems employing an OTS (Off-The-Shelf) item, and launching appropriate recovery actions. This paper presents results of a case study in protective wrapper development, using a Simulink model of a steam boiler system together with an OTS PID (Proportional, Integral and Derivative) controller. The protective wrappers are developed for the model of the system in such a way that they allow detection and tolerance of typical errors caused by unavailability of signals, violations of constraints, and oscillations.
[4] T. Anderson, M. Feng, S. Riddle, and A. Romanovsky. Wrapping it up. Safety Systems, 13(1):8-10, 2003.
[ bib | .pdf ]

Many siren voices, and some harsh economic facts, argue in favour of off-the shelf (OTS) components as a way to reduce the costs of software system development. Compared to bespoke design and development, the OTS option offers a number of potential benefits, including: immediate availability, proven in use, low price due to amortisation. The increasing scale and complexity of modern software systems is a powerful driver for modularity in design, which clearly chimes with a structured and therefore component (or sub-system) based approach.
[5] T. Anderson, B. Randell, and A. Romanovsky. Wrapping the Future. In 18th IFIP World Computer Congress, pages 165-173, Toulouse, France, 2004.
[ bib | .pdf ]

Enclosing a component within a software “wrapper” is a well-established way of adapting components for use in new environments. This paper presents an overview of an experimental evaluation of the use of a wrapper to protect against faults arising during the (simulated) operation of a practical and critical system; the specific context is a protective wrapper for an off-the-shelf software component at the heart of the control system of a steam raising boiler. Encouraged by the positive outcomes of this experimentation we seek to position protective wrappers as a basis for structuring the provision of fault tolerance in component-based open systems and networks. The paper addresses some key issues and developments relating wrappers to the provision of dependability in future computing systems.
[6] J.G.W. Bentley, P.G. Bishop, and M. van der Meulen. An Empirical Exploration of the Difficulty Function, volume 3219 (SAFECOMP '04, Potsdam, Germany) of LNCS, pages 60-71. Springer, 2004.
[ bib | .pdf ]

The theory developed by Eckhardt and Lee (and later extended by Littlewood and Miller) utilises the concept of a difficulty function to estimate the expected gain in reliability of fault tolerant architectures based on diverse programs. The difficulty function is the likelihood that a randomly chosen program will fail for any given input value. To date this has been an abstract concept that explains why dependent failures are likely to occur. This paper presents an empirical measurement of the difficulty function based on an analysis of over six thousand program versions implemented to a common specification. The study derived a score function for each version. It was found that several different program versions produced identical score functions, which when analysed, were usually found to be due to common programming faults. The score functions of the individual versions were combined to derive an approximation of the difficulty function. For this particular (relatively simple) problem specification, it was shown that the difficulty function derived from the program versions was fairly flat, and the reliability gain from using multi-version programs would be close to that expected from the independence assumption.
[7] R. de Lemos, C. Gacek, and A. Romanovsky. Architectural Mismatch Tolerance, volume 2677 of LNCS, pages 175-194. Springer, 2003.
[ bib | .pdf ]

The integrity of complex software systems built from existing components is becoming more dependent on the integrity of the mechanisms used to interconnect these components and, in particular, on the ability of these mechanisms to cope with architectural mismatches that might exist between components. There is a need to detect and handle (i.e. to tolerate) architectural mismatches during runtime because in the majority of practical situations it is impossible to localize and correct all such mismatches during development time. When developing complex software systems, the problem is not only to identify the appropriate components, but also to make sure that these components are interconnected in a way that allows mismatches to be tolerated. The resulting architectural solution should be a system based on the existing components, which are independent in their nature, but are able to interact in well-understood ways. To find such a solution we apply general principles of fault tolerance to dealing with architectural mismatches.
[8] I. Gashi, P. Popov, and L. Strigini. Fault diversity among off-the-shelf SQL database servers. In Int. Conf. on Dependable Systems and Networks (DSN '04), pages 389-398, Florence, Italy, 2004.
[ bib ]

Fault tolerance is often the only viable way of obtaining the required system dependability from systems built out of off-the-shelf (OTS) products. We have studied a sample of bug reports from four off-the-shelf SQL servers so as to estimate the possible advantages of software fault tolerance - in the form of modular redundancy with diversity - in complex off-the-shelf software. We checked whether these bugs would cause coincident failures in more than one of the servers. We found that very few bugs affected two of the four servers, and none caused failures in more than two. We also found that only four of these bugs would cause identical, undetectable failures in two servers. Therefore, a fault-tolerant server, built with diverse off-the-shelf servers, seems to have a good chance of delivering improvements in availability and failure rates compared with the individual off-the-shelf servers or their replicated, non-diverse configurations.
[9] I. Gashi, P. Popov, V. Stankovic, and L. Strigini. On Designing Dependable Services with Diverse Off-The-Shelf SQL Servers, volume 3069 (Architecting Dependable Systems) of LNCS, pages 196-220. Springer, 2004.
[ bib | .pdf ]

The most important non-functional requirements for an SQL server are performance and dependability. This paper argues, based on empirical results from our on-going research with diverse SQL servers, in favour of diverse redundancy as a way of improving both. We show evidence that current data replication solutions are insufficient to protect against the range of faults documented for database servers; outline possible fault-tolerant architectures using diverse servers; discuss the design problems involved; and offer evidence of the potential for performance improvement through diverse redundancy.
[10] P.A. de C. Guerra, C.M.F. Rubira, A. Romanovsky, and R. de Lemos. A Fault-Tolerant Software Architecture for COTS-Based Software Systems. In 4th ESEC/SIGSOFT FSE Conf., pages 375-378, Helsinki, Finland, 2003.
[ bib ]

This paper considers the problem of integrating Commercial off-the-shelf (COTS) components into systems with high dependability requirements. Such components are built to be reused as black boxes that cannot be modified. The system architect has to rely on techniques that are external to the component for resolving mismatches between the services required and provided that might arise in the interaction of the component and its environment. The paper puts forward an approach that employs the layer-based C2 architectural style for structuring error detection and recovery mechanisms to be added to the component during system integration.
[11] P.A. de C. Guerra, C.M.F. Rubira, A. Romanovsky, and R. de Lemos. Integrating COTS Software Components into Dependable Software Architectures. In 6th IEEE Int. Symp. on Object-Oriented Real-Time Distributed Computing (ISORC '03), pages 139-142, Hakodate, Japan, 2003.
[ bib | .pdf ]

This paper considers the problem of integrating commercial off-the-shelf (COTS) software components into systems with high dependability requirements. These components, by their very nature, are built to be reused as black boxes that cannot be modified. Instead, the system architect has to rely on techniques external with respect to the component for resolving mismatches of the services required and provided that might arise in the interaction of the component and its environment. This paper proposes an architectural solution to turning COTS components into idealised fault-tolerant COTS components by adding protective wrappers to them.
[12] P.A. de C. Guerra, C.M.F. Rubira, A. Romanovsky, and R. de Lemos. A Dependable Architecture for COTS-Based Software Systems using Protective Wrappers, volume 3069 (Architecting Dependable Systems II) of LNCS, pages 147-170. Springer, 2004.
[ bib | .pdf ]

Commercial off-the-shelf (COTS) software components are built to be used as black boxes that cannot be modified. The specific context in which these COTS components are employed is not known to their developers. When integrating such COTS components into systems, which have high dependability requirements, there may be mismatches between the failure assumptions of these components and the rest of the system. For resolving these mismatches, system integrators must rely on techniques that are external to the COTS software components. In this paper, we combine the concepts of an idealised architectural component and protective wrappers to develop an architectural solution that provides an effective and systematic way for building dependable software systems from COTS software components.
[13] N. Jefferson and S. Riddle. Towards a formal semantics of a composition language. Accepted at 3rd Int. Workshop on Composition Languages, Darmstadt, Germany [updated TR to appear], 2003.
[ bib | .pdf ]

Although several composition environments exist that are built on top of object-oriented languages, they fail to supply the level of abstraction required to specify compositions of components. There is therefore a need for pure component-based languages in order to allow the composition developer to focus on the composition from a clear viewpoint, free of any obscurities imposed by existing programming languages that essentially operate at the individual component level. In this paper we make a clear distinction between a composition language and a composition representation. A composition language is any language that allows the specification of a piece of software in terms of its composition whereas a composition representation is the abstract, general, architectural description of a composition. This position paper sets out to formally express the basis for a composition representation. The definition of an abstract representation is necessary in order to derive the formal semantics of a composition language. We believe that this semantic definition should be the initial step in the construction of a high level component-based language.
[14] V. Kharchenko, P. Popov, and A. Romanovsky. On Dependability of Composite Web Services with Components Upgraded Online. In Int. Conf. on Dependable Systems and Networks (DSN '04 - Workshop supplement), pages 287-291, Florence, Italy, 2004.
[ bib | .pdf ]

Ensuring dependability of composite Web services, dynamically composed of component Web services, is an open issue. One of the main difficulties here is due to the fact that component Web services can and will be upgraded online. The challenge is then to ensure that the overall dependability of the composite service is not undermined. The solutions we propose in this position paper make use of natural redundancy present in systems containing a new and an old release of the component.
[15] B. Littlewood and L. Strigini. Redundancy and Diversity in Security, volume 3193 (9th European Symposium on Research in Computer Security, Sophia Antipolis, France - ESORICS '04) of LNCS, pages 423-438. Springer, 2004.
[ bib | .pdf ]

Redundancy and diversity are commonly applied principles for fault tolerance against accidental faults. Their use in security, which is attracting increasing interest, is less general and less of an accepted principle. In particular, redundancy without diversity is often argued to be useless against systematic attack, and diversity to be of dubious value. This paper discusses their roles and limits, and to what extent lessons from research on their use for reliability can be applied to security, in areas such as intrusion detection. We take a probabilistic approach to the problem, and argue its validity for security. We then discuss the various roles of redundancy and diversity for security, and show that some basic insights from probabilistic modelling in reliability and safety indeed apply to examples of design for security. We discuss the factors affecting the efficacy of redundancy and diversity, the role of independence between layers of defense, and some of the trade-offs facing designers.
[16] P. Popov, L. Strigini, and A. Romanovsky. Diversity for Off-The-Shelf Components. In Int. Conf. on Dependable Systems and Networks (DSN '00 - Fast Abstracts supplement), pages B60-B61, New York, USA, 2000.
[ bib | .pdf ]

'Commercial-off-the-shelf' (COTS) or, generally, 'off-the-shelf' (OTS) software items are increasingly used in building systems, instead of only relying on bespoke software items. This trend is driven by a wish to reduce costs, and by some hope that greater re-use of software may lead to higher quality (via more feedback from use). Thus, for instance, the U.S. Dept of Defence policy is now to encourage the use of COTS items. This trend extends to critical systems with high dependability requirements, like a computer-based railway signalling systems by Alcatel (Austria).
[17] P. Popov, L. Strigini, S. Riddle, and A. Romanovsky. Protective Wrapping of OTS Components. In 4th ICSE Workshop on Component-Based Software Engineering: Component Certification and System Prediction, Toronto, Canada 2001.
[ bib ]

Off-the-shelf (OTS) components are increasingly used in application areas with high dependability requirements. We propose a general approach to developing protective wrappers, in order to integrate OTS items with the rest of the system without reducing the system dependability.
[18] P. Popov, S. Riddle, A. Romanovsky, and L. Strigini. On Systematic Design of Protectors for Employing OTS Items. In Workshop on Component-Based Software Engineering, 27th Euromicro Conf., pages 22-29, Warsaw, Poland, 2001.
[ bib | .pdf ]

Off-the-shelf (OTS) components are increasingly used in application areas with stringent dependability requirements. Component wrapping is a well known structuring technique used in many areas. We propose a general approach to developing protective wrappers that assist in integrating OTS items with a focus on the overall system dependability. The wrappers are viewed as redundant software used to detect errors or suspicious activity and to execute appropriate recovery when possible; wrapper development is considered as a part of system integration activities. Wrappers are to be rigorously specified and executed at run time as a means of protecting OTS items against faults in the rest of the system, and the system against the OTS itemOsfaults. Possible symptoms of erroneous behaviour to be detected by a protective wrappe, and possible actions to be undertaken in response are listed and discussed. The information required for wrapper development is provided by traceability analysis. Possible approaches to implementing ŇprotectorsÓ in the standard current component technologies are briefly outlined.
[19] P. Popov. Reliability Assessment of Legacy Safety-Critical Systems Upgraded with Off-the-Shelf Components, volume 2434 (SAFECOMP '02, Catania, Italy) of LNCS, pages 139-150. Springer, 2002.
[ bib | .pdf ]

Reliability assessment of upgraded legacy systems is an important problem in many safety-related industries. Some parts of the equipment used in the original design of such systems are either not available off-the-shelf (OTS) or have become extremely expensive as a result of being discontinued as mass production components. Maintaining a legacy system, therefore, demands using different OTS components. Trustworthy reliability assurance after an upgrade with a new OTS component is needed which combines the evidence about the reliability of the new OTS component with the knowledge about the old system accumulated to date. In these circumstances Bayesian approach to reliability assessment is invaluable. Earlier studies have used Bayesian inference under simplifying assumptions. Here we study the effect of these on the accuracy of predictions and discuss the problems, some of them open for future research, of using Bayesian inference for practical reliability assessment.
[20] P. Popov and L. Strigini. Diversity with Off-The-Shelf Components: A Study with SQL Database Servers. In Int. Conf. on Dependable Systems and Networks (DSN '03 - Fast Abstracts supplement), pages B84-B85, San Francisco, USA, 2003.
[ bib | .pdf ]

Fault tolerance is often the only feasible remedy available to a user or integrator when using insufficiently dependable off-the-shelf software products. In particular, modular redundancy with diversity, as e.g. in N-version software, may be an affordable solution, but there has been little study of its practical effectiveness and implementation difficulties with off-the-shelf components. We have started an experiment to help to remedy this situation. We report preliminary observations from the development and early use of the experimental set-up.
[21] P. Popov, L. Strigini, A. Kostov, V. Mollov, and D. Selensky. Software Fault-Tolerance with Off-the-Shelf SQL Servers. In 3rd Int. Conf. on COTS-Based Software Systems (ICCBSS '04), pages 117-126, Redondo Beach, USA, 2004.
[ bib | .pdf ]

With off-the-shelf software, software fault tolerance is almost the only means available for assuring better dependability than the off-the-shelf software offers, without the much higher costs of bespoke development or extra V&V. We report our experience with an experimental setup we have developed with off-the-shelf SQL database servers. First, we describe the use of a protective wrapper to mask the effects of a bug in one of the servers, without depending on an adequate fix from the vendors. We then discuss how to combine the diverse off-the-shelf servers into a diverse modular redundant configuration (N-version software or N-self-checking software). A wrapper guarantees the consistency between the diverse replicas of the database, serving multiple clients, by restricting the concurrency between the client transactions We thus show that diverse modular redundancy with protective wrapping is a viable way of achieving fault-tolerance with even complex off-the-shelf components, like database servers.
[22] A. Romanovsky. Exception Handling in Component-based System Development. In 25th Int. Computer Software and Application Conf. (COMPSAC '01), pages 580-586, Chicago, USA, 2001.
[ bib | .pdf ]

Designers of component-based software face two problems related to dealing with abnormal events: developing exception handling at the level of the integrated system and accommodating (and adjusting, if necessary) exceptions and exception handling provided by individual components. Our intention is to develop an exception handling framework suitable for component-based system development by applying general exception handling mechanisms which have been proposed and successfully used in concurrent/distributed systems and in programming languages. The framework is applied in three steps. Firstly, individual components are wrapped in such a way that the wrappers perform activity related to local error detection and exception handling, and signal, if necessary, external exceptions outside the component. At the second step the execution of the overall system is structured as a set of dynamic actions in which components take parts. Such actions have important properties which facilitate exception handling: they are atomic, contain erroneous information and serve as recovery regions. The last step is designing exception handling at the action level: each action (i.e. all components participating in it) handles exceptions signalled by individual wrapped components.
[23] A. Romanovsky. On version state recovery and adjudication in class diversity. Computer Systems Science and Engineering, 17(3):159-168, 2002.
[ bib | .pdf ]

The paper proposes a general approach to recovering faulty versions and adjudicating complete states of versions in object-oriented N-version programming which is based on the concepts of the abstract version state and mapping functions. Our recent progress in developing recovery features is reported (the previous results are presented in [1, 2]). We propose employing adjudication of version states as a means for advanced error detection. The properties which the abstract version state and mapping functions should have, in order to be used in both version recovery and state adjudication, are formulated. We introduce state and result adjudication which are useful for object-oriented programming, demonstrate how they can serve the purpose of error detection and discuss situations when the former can be effective (assuming that the latter is always used to guarantee the correctness of results). The paper describes the engineering of abstract version states: we consider three types of programmers involved in N-version programming and show how they share responsibilities and cooperate while applying the approach proposed. The paper discusses important practical issues related to implementation and application of the concepts proposed and demonstrates, with numerous examples, the usability of the approach. A thorough comparison of the existing schemes with our proposal concludes the paper.
[24] L. Strigini. Fault Tolerance Against Design Faults. In H. Diab and A. Zomaya, editors, Dependable Computing Systems: Paradigms, Performance Issues, and Applications. Wiley, 2004.
[ bib ]

This chapter surveys techniques for tolerating the effects of design defects in computer systems, paying special attention to software. Design faults are a major cause of failure in modern computer systems, and their relative importance is growing as techniques for tolerating physical faults gain wider acceptance. Although design faults could in principle be eliminated, in practice they are inevitable in many categories of systems, and designers need to apply fault tolerance for mitigating their effects. Limited degrees of fault tolerance in software - defensive programming - are common, but systematic application of fault tolerance for design faults is still rare and mostly limited to highly critical systems. However, the increasing dependence of system designers on off-the-shelf components often makes fault tolerance a necessary, feasible and probably cost-effective solution for achieving modest dependability improvements at affordable cost. This chapter introduces techniques and principles, outlines similarities and differences with fault tolerance against physical faults, provides a structured description of the space of design solutions, and discusses some design issues and trade-offs.
[25] M.J.P. van der Meulen, P.G. Bishop, and M. Revilla. An Exploration of Software Faults and Failure Behaviour in a Large Population of Programs. In ISSRE '04, Rennes, France, 2004.
[ bib ]

A large part of software engineering research suffers from a major problem—there are insufficient data to test software hypotheses, or to estimate parameters in models. To obtain statistically significant results, a large set of programs is needed, each set comprising many programs built to the same specification. We have gained access to such a large body of programs (written in C, C++, Java or Pascal) and in this paper we present the results of an exploratory analysis of around 29,000 C programs written to a common specification. The objectives of this study were to characterise the types of fault that are present in these programs; to characterise how programs are debugged during development; and to assess the effectiveness of diverse programming. The findings are discussed, together with the potential limitations on the realism of the findings.
[26] M.J.P. van der Meulen. On the use of smart sensors, common cause failure and the need for diversity. 6th Int. Symp. Programmable Electronic Systems in Safety Related Applications, Cologne, Germany, TUV, 2004.
[ bib | .pdf ]

The use of smart sensors in highly critical (safety) applications is still being debated. In this paper, we compare the dependability aspects of deploying smart sensors vs. conventional ones using an FMEA. There appear to be some significant differences. Some failure modes do not exist in conventional sensors, e.g. those involving information overload and timing aspects. Other failure modes emerge through the use of different technologies, e.g. those involving complexity, data integrity and human interface. When using smart sensors we suggest the use of a set of guidelines for their deployment: 1. Do not send data to the smart sensor. 2. Use the smart sensor in burst mode only. 3. Use a smart sensor with the least possible number of operational modes. 4. Use the simplest possible sensor for the application. In redundant sensor configurations, common cause failure becomes the dominant failure scenario. The failure modes of smart sensors suggest that smart sensors might be more susceptible to common cause failure than conventional ones. Dominant are failures having their origin in the human interface, complexity and information overload. The guidelines given will also reduce the probability of common cause failure. In redundant sensor configurations a possible design method is the use of diversity. Diversity has the advantage that it can reduce the probability that two or more sensors fail simultaneously, although this effect is limited by the fact that diverse sensors may still contain the same faults. A disadvantage of diversity can be the increased complexity of maintenance, which in itself can lead to a higher probability of failure of the smart sensors. Whether the use of diversity is advisable depends on the design of the smart sensors and the details of their application.
[27] M.J.P. van der Meulen, S. Riddle, L. Strigini, and N. Jefferson. Protective Wrapping of Off-the-Shelf Components. In 4th Int. Conf. on COTS-Based Software Systems (ICCBSS '05), Bilbao, Spain [to appear], 2005.
[ bib | .pdf ]

System designers using off-the-shelf components (OTSCs), whose internals they cannot change, often use add-on wrappers to adapt the OTSCs' behaviour as required. In most cases, wrappers are used to change functional properties of the components they wrap. In this paper we discuss instead protective wrapping, the use of wrappers to improve the dependability - i.e., non-functional properties like availability, reliability, security, and/or safety - of a component and thus of a system. Wrappers can improve dependability by adding fault tolerance, e.g. graceful degradation, or error recovery mechanisms. We discuss the rational specification of such protective wrappers in view of system dependability requirements, and highlight some of the design trade-offs and uncertainties affecting system design with OTSCs and wrappers, and differentiating it from other forms of fault-tolerant design.

This file has been generated by bibtex2html 1.69