Calculation speed in financial models has been a topic for a long time, however with the increasing complexity of transactions, rising utilisation of data and need to model ever-more scenarios, model efficiency is more important than ever. Given that we champion the need for models to be Responsive as one of the six guiding themes of the Corality methodology, this is an area we are keenly interested in and would like to share our views.
This topic has been covered by numerous experts in other forums, however the intention of this blog series is to highlight the most significant issues we see in practice and our recommendations on how to remedy them based on our experience building and auditing models for many years.
What slows down models?
There are many reasons models become slow including unavoidable factors such as the scope of the model and complexity of the underlying entities, or avoidable factors including unnecessary complexity, use of excessive functionality and sub-optimal coding of formulae.
We have grouped our key recommendations into three broad categories shown in the table below. These categories will make up the three parts of this blog series, in which this post is Part 1.
|1.||Function selection||Which functions perform best and how to construct formulae efficiently|
|2.||Model structure||How to structure a model for improved performance|
|3.||Intensive functionality||Overly intensive Excel functionality and more efficient alternatives|
The focus of these posts will be on the practical steps you can take to improve model calculation efficiency in a financial modelling context, maintaining the required model functionality and following best practice modelling principles.
Avoid semi-anchored references
The above imagine shows a simple example of a semi-anchored reference, where the first part of the SUM range has been column anchored to allow the range to expand as the formula is copied to the right. The purpose of this formula is to calculate the operations period number by summing all the preceding operations periods flags until the current period. This operation is nested within an IF to only return a result during an operations period.
Whilst this is a logical approach to calculating the operations period, it is highly inefficient. Let’s compare this approach to our recommended approach to this calculation as shown below. In our approach we simply add the current period forecast flag to the prior period forecast period counter and nest this operation within an IF statement to only return a value during an operations period.
Using this calculation each cell in this row will therefore only reference three other cells, so the total number of cell references per row will be three times the number of model periods (let’s call this number “n”). In a 30-year monthly model there will be 30 x 12 = 360 columns (i.e. n=360), therefore this calculation will create 3 x 360 = 1,080 cell references.
In the semi-anchored version there are significantly more cell references, as the formula needs to reference every model period column up to the current period, as well as the flag. Therefore, the number of references for each cell will be i + 1, (where i is the column counter for that cell). The formula for the total number of cell references across the row will be the sum of each of these calculations for all i from 1 to n:
The same logic applies to all kinds of semi-anchored references in calculation rows, and therefore these should be avoided if at all possible. The general principle here is that repeating calculations unnecessarily should always be avoided. This will be a key topic in Part 2 of this series.
Don’t use volatile functions
OFFSET(), INDIRECT(), CELL(), INFO(), RAND(), NOW() and TODAY()
Volatile functions are those whose value may change, even if its precedents do not, and as a result, these formulae must be treated differently by Excel’s calculation engine. Typically, when a change is made to a cell Excel will only recalculate the dependency tree for that cell, i.e. all of the cells whose values depend on the cell that has changed. This saves unnecessary calculations from being performed for cells whose values will not be affected by the change, and therefore increases calculation speed of the model.
For volatile functions this approach cannot be applied because by their definition their results may change even if their precedents do not. As a result, all volatile functions and their dependency trees, (cells that depend on a cell with a volatile function), are always recalculated, regardless of which cells are changed. In the lingo, these cells are known as “dirty cells”.
The bottom line is volatile functions should be avoided whenever possible, especially if they have large dependency trees. For the two volatile functions we most commonly see in models we audit, (i.e. INDIRECT and OFFSET), the INDEX function can often be used as a non-volatile alternative. For an in-depth look at how this can be done and the lesser-known properties of the INDEX function see the further reading section at the bottom of this post.
Use IF logic to prevent unnecessary calculations
As a matter of best practice models should be set up to apply a consistent formula across each row, however many of these calculations are not actually needed for most periods in the model timeline. Take the example of a Refinancing fee; if a facility requires refinancing every five years and the model has a monthly periodicity, then only 1 of every 60 refi fee calculations are required, i.e. 98% of the calculations in this row are unnecessary.
We certainly wouldn’t recommend deleting these unnecessary calculations, however there is another way to prevent these calculations from occurring by using flags and the IF function. The reason this works is that the IF function only calculates the result for the first (or second) nested formulae if the criteria is met (or not met). In our example, by nesting the calculation for Refi fee within an IF statement which depends on the Refi event flag, we would prevent all of the unnecessary refi fee calculations from being performed for periods where a refi event hasn’t occurred and the flag value is 0. More generally, the same approach could be applied to any calculation which only needs to be performed in particular periods by first setting up a flag to identify those periods, then creating a formula which follows the below structure:
=IF([FLAG]=0, 0, [CALCULATION])
If followed consistently this approach will significantly increase calculation speeds and is a must in large and complex models.
Use the fastest functions for single-cell lookups
We recommend LOOKUP, SUMIFS and INDEX-MATCH
When performing lookups, Excel offers many functions to choose from, of which the most commonly used are VLOOKUP, HLOOKUP, LOOKUP, INDEX-MATCH, and less commonly, SUMPRODUCT, SUMIF/SUMIFS and OFFSET.
To narrow down this list, we can exclude:
- OFFSET for being a volatile function
- VLOOKUP/ HLOOKUP as they are strictly inferior to INDEX-MATCH in both functionality and calculation speed (for a detailed explanation of why, see the further reading section below)
- SUMIF as being almost identical to SUMIFS but less flexible, (SUMIFS can be adjusted more easily for additional criteria)
- SUMPRODUCT as performing the same task as SUMIFS but less efficiently (see further reading).
This leaves us with only LOOKUP, INDEX-MATCH and SUMIFS.
 Note INDEX-MATCH is in the process of being replaced by a new function called XLOOKUP, however given this will only be available with Office 365 Excel, we will not include this function because of backwards compatibility issues
From a purely speed perspective LOOKUP and INDEX-MATCH (type 1) are the fastest, followed by INDEX-MATCH (type 0), with SUMIFS the slowest as it is required to scan the entire criteria range whilst the other functions stop once they find a match. We would not recommend making the decision between INDEX-MATCH and SUMIFS purely on a speed basis however, as from a best practice perspective INDEX-MATCH rates poorly across the themes of simplicity and accessibility.
Taking all of this into account, we have created the below decision tree to determine the recommended function to use for a single cell lookup. A full explanation of this decision tree as well as the detailed pros and cons of each function is discussed in our webinar Excel blackbelt: New versions of old functions.
Exhibit 1: Single-cell lookup decision tree
Only use SUMIFS for multi-cell lookups
For returning the sum of multiple cells based on a lookup there are two common alternatives: SUMPRODUCT and SUMIFS. As described above SUMIFS calculates significantly faster, however it also has greater functionality than SUMPRODUCT in being able to efficiently handle entire row or column references. Furthermore, if given a column reference SUMIFS will stop processing once it has reached the end of the used range, whereas SUMPRODUCT will perform the calculation for the entire column.
For these reasons we recommend using SUMIFS over SUMPRODUCT for summation of multi-cell lookups.
Array formulae: More trouble than they’re worth
Array formulae are those that can perform an operation on an array of values rather than just a single value, and similarly can return an array of results rather than a single result.
These formulae can be very powerful tools, replacing hundreds or thousands of formulae with a single formula, however we strongly recommend against using array formulae in a modelling context. This is because array formulae will almost always be slower than non-array alternatives as they typically include many unnecessary calculations, a topic we will cover in more detail within Part 2 of this series.
Array formulae also score poorly across other best practice principles, contradicting the Corality themes of simplicity and accessibility through their inherent complexity. Furthermore, the new array formulae changes currently being rolled out will impact the way existing array formulae work, potentially creating errors in existing models and increasing the risk of not getting it right.
In most modelling contexts, array formulae can be avoided by using a helper column or row, which simply moves part of the calculation into a separate cell, often dramatically reducing the number of calculations required and therefore increasing efficiency. Note this relates to the fundamental modelling principle of step-by-step calculations, which we will discuss in Part 2 of the series.
That’s it for part one. If you are looking for additional resources on this topic, we recommend checking out the links under further reading below. Please note that parts two and three of this series will be coming soon to Corality.com, so check back regularly to catch those!
- Model efficiency:
- Excel performance: Improving calculation performance – microsoft.com
- Speed up Excel in 17 Easy Steps and Calculate Faster – professor-excel.com
- How to Make your Excel Models Faster – accessanalytic.com.au
- 10 Tricks to Fix Slow Excel Spreadsheets (Speed-up Excel) – trumpexcel.com
- Volatile functions
- Why INDEX-MATCH Is Far Better Than VLOOKUP or HLOOKUP in Excel – exceluser.com
- Excel’s SUMIFS or SUMPRODUCT…Which is faster? – exceluser.com
- Array formulae:
- Excel array formulas, functions and constants – examples and guidelines – ablebits.com
- The Complete Guide to INDEX & MATCH, CHAPTER 2: (see section “Method 1: Using helper cells” – xelplus.com