The use of any single benchmark is inherently misleading because few production systems handle a single type of workload. Transactions and queries may vary widely from application to application, and the overall processing mix is likely to include batch as well as other types of system operations.
Thus, any cost comparison between different systems should be based on quantified workloads that correspond to the current and future requirements of the business.
Database Query Volumes
Particular attention should be paid to query volumes. Although the effects of transaction and batch workloads are well documented for IS environments, large volumes of user-initiated database queries are a more recent phenomenon. Their effects are still poorly understood in most organizations.
Businesses that move to client/server computing commonly experience annual increases in query volumes of from 30 to 40 percent. Moreover, these types of increases may continue for at least five years. Although most transactions can be measured in kilobytes, queries easily run to megabytes of data. Long, sequential queries generate particularly heavy loading. Thus, once client/server computing comes into large-scale use within an organization, heavier demands are placed on processor, database, and storage capacity.
In many organizations, queries can rapidly dominate computing workloads, with costs exceeding those for supporting online transaction processing applications. IS
costs in this type of situation normally go up, not down. It is better to plan for this growth ahead of time.
SERVICE LEVELS
Service levels include response time, availability, hours of operation, and disaster recovery coverage. They are not addressed by generic performance indicators such as MIPS or TPC metrics and are not always visible in workload calculations. However, service levels have a major impact on costs.
Many businesses do not factor in servic e-level requirements when evaluating different types of systems. The requirements need to be explicitly quantified. All platforms under evaluation should include costs for configurations that will meet the required levels.
Response Time
Response time is the time it takes a system to respond to a user-initiated request for an application or data resource. Decreasing response time generally requires additional investments in processors, storage, I/O, or communications capacity, or (more likely) all of these.
In traditional mainframe-based online transaction processing applications, response time equates to the time required to perform a transaction or display data on a terminal. For more complex IS environments, response time is more likely to be the time required to process a database query, locate and retrieve a file, and deliver a document in electronic or hard-copy form. Delivering fast response time according to these criteria is both more difficult and more expensive than it is for traditional mainframe applications.
Availability
Availability is the absence of outages. Standard performance benchmarks, and even detailed measurements of system performance based on specific workloads, provide no direct insight into availability levels. Such benchmarks do not indicate how prone a system will be to outages, nor what it will cost to prevent outages.
Even in a relatively protected data center environment, outages have a wide range of common causes. These include bugs in system and applications software, hardware and network failures, as well as operator errors. When computing resources are moved closer to end users, user error also becomes a major source of disruptions.
Production environments contain large numbers of interdependent hardware and software components, any of which represents a potential point of failure. Even the most reliable system experiences some failures.
Thus, maintaining high availability levels may require specialized equipment and software, along with procedures to mask the effects of outages from users and enable service to be resumed as rapidly as possible with minimum disruption to applications and loss of data. The less reliable the core system, the more such measures will be necessary.
Availability can be realized at several levels. Subsystem duplexing and resilient system designs have cost premiums. For example, to move from the ability to restart a system within 30 minutes to the ability to restart within a few minutes can increase costs by orders of magnitude.
Hours of Operation
Running multiple shifts or otherwise extending the hours of operation increases staffing requirements. Even if automated operations tools are used, it will usually be necessary to maintain personnel on-site to deal with emergencies.
Disaster Recovery
Disaster recovery coverage requires specialized facilities and procedures to allow service to be resumed for critical applications and data in the event of a catastrophic outage. Depending on the level of coverage, standby processor and storage capacity may be necessary or an external service may be used. Costs can be substantial, even for a relatively small IS installation.
SOFTWARE LOADING
The cost of any system depends greatly on the type of software it runs. In this respect, again, there is no such thing as a generic configuration or cost for any platform. Apart from licenses and software maintenance or support fees, software selections have major implications for system capacity. It is possible for two systems running similar workloads, but equipped with different sets of applications and systems software, to have radically different costs.
For example, large, highly integrated applications can consume substantially more computing resources than a comparable set of individual applications. Complex linkages between and within applications can generate a great deal of overhead.
Similarly, certain types of development tools, databases, file systems, and operating systems also generate higher levels of processor, storage, and I/O consumption.
Exhibit 2 contains a representative list of the resource management tools required to cover most or all the functions necessary to ensure the integrity of the computing installation. If these tools are not in place, organizations are likely to run considerable business risks and incur excessive costs. Use of effective management tools is important with any type of workload. It is obligatory when high levels of availability and data integrity are required. In a mainframe-class installation, tools can consume up to 30 percent of total system capacity and license fees can easily run into hundreds of thousands of dollars.
Exhibit 2. A Representative List of Resource Management Tools System Level
High Availability
System Management/Administration
Power/Environmental Monitoring
Performance Management
Disk Mirroring/RAID
Performance Monitoring/Diagnostics
Fallover/Restart
Performance Tuning/Management
Disaster Recovery Planning
Capacity Planning
Network Management
Applications Optimization
Operations Management/Control
Storage Management
Change Management
Host Backup/Restore
Configuration Management
Hierarchical Storage Management
Problem Management
Disk Management
Resource Monitoring/Accounting
Disk Defragmentation
Software Distribution/License Control
Tape Management
Operations
Tape Automation
Print/Output Management
Volume/ File Management
Job Rescheduling/Queuing/Restart
Configuration/Event Management Resource Allocation
Configuration Management
Workload Management
Change/Installation Management
Load Balancing
Fault Reporting/Management
Console Management
Problem Tracking/Resolution
Automated Operations
Data Management
Administrative
Database Administration
Resource Accounting/Chargeback
Security
Data Center Reporting
Statistical Analysis/Report Generation
EFFICIENCY OF IS RESOURCE USE
Capacity Utilization Most computing systems operate at less than maximum capacity most of the time. However, allowance must be made for loading during peak periods. Margins are usually built into capacity planning to prevent performance degradation or data loss when hardware and software facilities begin to be pushed to their limits. If the system is properly managed, unused capacity can be minimized.
When planning costs, IS managers must distinguish between the theoretical and used capacity of a system. Failure to account for this is one of the more frequent causes of cost overruns among users moving to new systems. It may be necessary to add additional capacity to handle peak workloads. For example, properly managed disk storage subsystems may have high levels of occupancy (85 percent and over is the norm in efficient installations). There is a close relationship between data volumes used in applications and actual disk capacity. Inactive data is more frequently dumped to tape, thus reducing disk capacity requirements and corresponding hardware costs. If a system operates less efficiently, capacity requirements, and hence costs, can be substantially higher even if workloads are the same.
Consolidation, Rationalization, and Automation
Properly applied, the principles of consolidation, rationalization, and automation almost invariably reduce IS costs. Conversely, an organization characterized by diseconomies of scale, unnecessary overlaps and duplications of IS resources, and a prevalence of manual operating procedures will experience significantly higher IS
costs than one that is efficiently managed.
For example, in many organizations, numerous applications perform more or less the same function. These applications have few users relative to their CPU and storage capacity utilization, as well as to their license fee costs. Proliferation of databases, networks, and other facilities, along with underutilized operating systems and subsystems, also unnecessarily increase IS costs.
Requirements for hardware capacity can also be inflated by software versions that contain aged and inefficiently structured code. System loading will be significantly less if these older versions are reengineered or replaced with more efficient alternatives or if system, database, and application tuning procedures are used.
Automation tools can reduce staffing levels, usually by eliminating manual tasks.
Properly used, these tools also deliver higher levels of CPU capacity utilization and disk occupancy than would be possible with more labor-intensive scheduling and tuning techniques. A 1993 study commissioned by the U.S. Department of Defense compared key cost items for more efficient best-practice data centers with industry averages. Its results, summarized in Exhibit 3, are consistent with the findings of similar benchmarking studies worldwide.