Modern computing systems typically utilize a large number of computing nodes to perform coordinated computations in parallel or simultaneously. They can exhibit multiple performance states or levels due to statuses or failures of their consistent nodes. Performability analysis is concerned with assessing the probability that the computing system performs at a particular performance level. In the context of performability analysis, these computing systems can be modeled using k-to-l-out-of-n structures. This paper proposes new analytical methods based on binary decision diagrams (BDD) for the performability analysis of large computing systems with unrepairable computing nodes. A new and efficient BDD algorithm that makes full uses of the special k-to-l-out-of-n structure is first proposed for systems with computing node having identical computing powers. New simplification rules are further proposed to generate compact and canonical BDD models for systems with heterogeneous computing nodes characterized by different computing powers. Ordering heuristic is also explored to further reduce the size of BDD models. Examples are provided to illustrate the proposed BDD-based performability analysis methodology as well as its efficiency in analyzing large-scale computing systems.
The minimal cutsets based method for analyzing traditional coherent fault trees, methods based on prime implicants were developed for noncoherent fault tree analysis. Similar to the traditional cutsets-based coherent fault tree analysis methods, the prime implicants-based methods for noncoherent fault trees suffer from the exponential complexity problem. More recently combinatorial methods based on binary decision diagrams (BDD) have been developed for fault tree analysis. Compared to other existing methods, the BDD-based algorithms are more efficient due to their concise representation of Boolean functions as well as the ease of BDD manipulations. The existing BDD-based methods however use a bottom-up scheme to generate BDD models, which can generate a large number of redundant intermediate BDD nodes. They also fail to take advantage of the special structure of k-to-l-out-of-n models to improve the efficiency of the model generation.
We advance the state-of-the-art by proposing new BDD-based methods for the performability analysis of large computing systems with k-to-l-out-of-n structures. Both the system and its components are not repairable during the mission. Novel and efficient BDD model generation procedures are first proposed for systems with computing nodes having identical computing power but non-identical failure distributions, which make a full use of the special k-to-l-out-of-n structure and avoid the large number of intermediate manipulations involved in the traditional bottom-up generation scheme. It is also typical that the computing nodes are non-identical with different failure distributions and different computing power values, for example in modern datacenters or networked computing systems developed using heterogeneous technology and support systems, or incremental deployment processes. Therefore, new simplification rules are proposed to generate compact BDD models for system with heterogeneous computing nodes. Ordering heuristic is also developed to further reduce the size of BDD models.