Project Estimating Many information systems projects exceed their original budgets. It seems to be almost impossible to estimate reliably how much an information system will cost and how long it will take to develop it, and it is just as difficult to keep a project on course once an estimate has been accepted. There are a number of reasons for the difficulty of software estimates. Chief among them is the unique nature of software as a non-manufactured engineering artifact. There are no raw materials to speak of (maybe licenses for third party products that may be incorporated) and little in the way of plant (except for costs of system software, such as compilers). In other words, the highest proportion of the cost of an information system is the cost of the development process. As a first approximation, therefore, `How much will it cost?' is the same question as `how long will it take?' Thus, to estimate the cost of an information system, you need to know how much the developers cost (direct labor costs plus overheads) and how much development effort will be required. By convention, we use the units dollars per person month and person months, respectively. Once you have that estimate, you can then add on the cost of licenses and other capital expenditures. The real estimate is the estimate of effort. Retrospective studies of actual projects in many organizations have shown many factors to be significant cost drivers. The most significant, however, is always size. When you have an existing information system and cost data for it, measuring its size is fairly trivial: you just count the number of lines of code. (The usual unit is thousands of lines of code, or KLOC). Of course, when you are predicting the effort of a planned project, you have no lines of code to count. It turns out to be easy to make educated estimates, however, based on analogy with other systems and by decomposing the planned system into the parts that are probably necessary and estimating the size of the parts independently. That way, any independent estimation errors are likely to cancel each other out and the combined estimates may be quite accurate. Systematic errors, such as wishful thinking, cannot be removed this way. There are a number of cost estimation models based on KLOC estimates. Most are proprietary, and all should be used with caution, because they are only valid for projects that are similar to those that were sampled to derive the model. Question: What is a line of code, anyway? Might there be a better measurement of size? The COCOMO Models Barry Boehm and his colleagues made a detailed study of projects in TRW over many years and developed a series of cost estimation models, called COCOMO for predicting effort. There are three COCOMO models: - Basic COCOMO is the simplest model and should be used for rough estimates. - Intermediate COCOMO is more detailed and takes into account a number of factors that are ignored in the Basic model. These additional factors are called cost drivers. - Advanced COCOMO is even more detailed than the Intermediate model and gives separate estimates for the different phases of a project. (These phases presuppose a waterfall lifecycle, so the Advanced model cannot be used for a spiral or iterative project.) All three models have two equations. One predicts the effort (in person months) for the project. The other predicts the duration of the project (in months). They take the form: E = a(KLOC)^b D = c(E)^d KLOC is defined as thousands of lines of delivered source code (i.e. the size of the system). Question: Why isn't the duration a simple linear function of effort and staffing? In the effort equation, the factor b is usually slightly greater than 1. This reflects a diseconomy of scale. Question: What is the effect on effort of values for b of between 0 and 1, exactly 1, and greater than 1. What real-world project phenomena do the differences reflect? Types of Project How difficult a project is, and therefore how much the information system will cost, depends on a number of correlated factors. For example, In the Intermediate COCOMO model, factors a and c in the effort and duration equations are the products of constants and a number of cost drivers that collectively reflect how difficult or straightforward the project is. For example, there are cost drivers that take into account the expertise of the developers with applications of the same type and the maturity of the system software that the application is to be built on top of (Windows NT would have a higher driver than Unix, for example). In the Basic model, however, the difficulty/straightforwardness of the project is reflected in choices of three constants for a and c. Which pair of constants you choose depends on the type of project you have. There are three types of project: organic, semi-detached, and embedded. Organic: Relatively small and simple projects. Small development teams that have relevant application experience. The requirements are not rigid. Development process is fairly informal. Semi-detached: Either intermediate between organic and embedded, or a project with a mixture of organic and embedded subsystems. Embedded: Tight hardware and software constraints. Rigid requirements. Strict enforcement of standards. Much documentation. Question: Think up an example of each. Basic COCOMO drivers for the three types of project: Project type a b c d Organic 2.4 1.05 2.5 0.38 Semi-detached 3.0 1.12 2.5 0.35 Embedded 3.6 1.20 2.5 0.32