SS - читать онлайн бесплатно полную версию книги . Страница 50
Maintain redundant assets
Operate the assets within design parameters
Secure the assets.
It is possible to achieve higher reliability from using assets of superior quality that fail less often.
7.5.3.2 People and processes
All assets can fail to perform at the required level. Assets engineered and maintained for higher performance tend to have higher MTBF under the same operating conditions. This is more intuitive in the case of engineering artefacts such as hardware and software assets. It is harder to define or measure the reliability of people and process assets even where they clearly contribute to the failure of a service. The unavailability of a service staff member may cause the service to be unavailable. Procedural faults or unhandled exceptions in processes can lead to unavailability of services. The concept of MTBF applies to people and processes even if the actual metrics may be difficult or meaningless. The idea is the same. Higher MTBF means higher reliability.
This coupling between people and process assets helps improve the overall reliability of the system with improvements in one affecting the other. To reduce the stress on people assets the following motivation (M) and hygiene (H) tactics are useful:
Ensure staff have adequate knowledge and experience (M)
Train, educate, and supervise staff (M)
Reward staff for performing correctly, consistently, and ethically (M)
Develop a culture that promotes quality, efficiency, and ownership of output (M)
Improve the work environment including workplace design, productivity tools, information design, and supporting knowledge systems (H)
Automate tasks with monotony, complexity or low tolerance for variation (H)
Allocate adequate resources to balance workload and to reduce stress (H)
Designorganization to improve specialization and coordination of work (H).
To reduce the stress on process assets the following tactics are useful:
Put processes under the ownership and control of capable groups and individuals
Ensure the processes are fed with necessary knowledge and information
Reduce the in-process time to reduce average workload at any given moment
Reduce the amount of rework to be fed back into processes
Automate tasks where appropriate to reduce variation induced by people assets
Secure the processes from unauthorized use, intrusion, and sabotage.
7.5.4 Maintainability
Services need to be recovered as quickly as possible when they become unavailable to users. Mean Time to Restore Service (MTRS) for a service, system or component is the time taken on average to restore its full functionality. This includes not only any physical repair or replacement, but also all the other factors that contribute towards full functionality. It is possible to estimate the MTRS of a service only when there is sufficient data available about the supporting configuration of service assets. MTRS is a measure that depends on several factors including the following:
Configuration of service assets
Mean time to repair (MTTR) of individual components
Competency of support staff
Resources available including information
Policies, procedures, and guidelines
Redundancy.
Adjustments to the above factors in isolation or combination increase maintainability. Analysis of the way MTRS responds to each factor is useful for improving the design of services and performance in operation. Reducing any of the following factors can reduce MTRS (Figure 7.19):
Time to record
Time to respond
Time to resolve
Time to physically repair or replace
Time to recover.
Figure 7.19 Improvement opportunities within incident lifecycle
It is normal to measure time strictly in real terms of seconds, minutes, hours and days. The periodicity of business activity varies between customers and contracts. In situations where the rate of loss to the business is linear with time, it is useful to measure the time factors indirectly in terms such as cycles, miles, transactions and trades to sense the true impact on business.
Toolbox Tip
Methods and principles of Design of Experiments (DOE), Six Sigma and systems dynamics modelling methods are useful in developing decision models for maintainability and reliability.
7.5.5 Redundancy
Redundancy is a means of increasing reliability and maintainability of systems. High-availability systems typically have some level of redundancy built in. There are four primary types of redundancy useful selectively or in combination: active, passive, diverse and heterogeneous (Figure 7.20).
7.5.5.1 Active redundancy
Productive capacity of redundant assets is in service all the time. Their use distributes load across the system and promotes a higher MTBF at system and component level from reduced stress of each component. There is minimal disruption to the service from quick switchover to Hot Standby with replicated capabilities and resources. This type of redundancy is used to support critical services and business activity that cannot tolerate any level of disruption. This option is relatively expensive because it involves asset-specific or dedicated capacity.
7.5.5.2 Passive redundancy
Redundant assets enter service when failures occur. They are idle in the meantime or are otherwise used. There is switchover time involved. If this time is tolerable by the service or business activity, then passive redundancy could be a less expensive alternative to active redundancy. The capacity used is less asset-specific so its cost may be spread across several services or contracts.
7.5.5.3 Diverse redundancy
Diverse redundancy is from different types of service assets sharing certain capabilities but with distinctive strengths and weaknesses. This makes diverse redundancy resistant to a single cause of failure. It is harder to implement because of the integration element between diverse types of assets. This type of redundancy is used when there is high uncertainty about the causes of failure.
7.5.5.4 Homogeneous redundancy
Homogeneous redundancy is from extra capacity of the same type of service assets. It is useful when there is high certainty about the causes of failure, and sufficient capacity is necessary to support demand. It is simpler to implement and maintain.
Figure 7.20 Choosing the right type of redundancy
7.5.6 Time between failures and accessibility
Reliability and maintainability are factors of service availability defined in terms of faults and failures of one or more of the underlying service assets. However, what matters to users is whether they can utilize the service or not. MTBF and MTRS mean little to them unless service levels are degraded or disrupted. The availability of services can be low even when service assets have high MTBF and low MTRS. In the time between failures, users expect the service to be easily accessible for utilization without inconvenience and undue effort on their part. Accessibility of a service is illustrated by the following examples.
An airline decides to improve customer satisfaction by increasing the number of ways for customers to purchase tickets and prepare for travel. It offers an online channel for passengers to check flight status, select seats, check in and print boarding passes before arriving at the airport. It also installs a network of self-service terminals that allow passengers with only carry-on baggage to proceed to the gates without having to wait in line at the counters. The net effect is that of virtually extending the ‘surface area’ of the airport check-in counter to locations convenient to the passenger, such as homes, offices and hotel rooms. The airline staff and other passengers at the airport benefit from reduced congestion. Passengers self-select between airport counters, self-service kiosks and the website channels, based on personal preference. They also respond to incentives offered by the airline to control the arrival of demand at particular locations. Similarly, a retail bank decides to make frequently requested and simple transactions available on its website and wireless devices such as telephones and personal digital assistants (PDAs).
Both businesses have effectively increased the probability that their services will be easily available for use by their customers. The improvements are not through the MTBF and MTRS factors. The primary factor has accessibility through a wider area of contact between customers and service assets through well-defined interfaces (Figure 7.21). Increasing the ‘surface area’ of contact of the service delivery system directly results in increased service availability from the users’ perspective.