ARulesXL NewsletterA number of users are creating larger ARulesXL applications and have questions about performance. There are a number of performance factors to understand.
We designed ARulesXL to be easy for beginners, allowing you to simply create a rule set and do RQuery()s against it right away. To make it that easy, and to maintain dependencies, we have to re-load the rules for every query. In other words, for beginning use we traded efficiency for ease-of-use.
As your applications grow, if there are a large number of rules, or a large number of queries, the reloading of the rules will become tedious. For these larger applications we provide a number of techniques for avoiding unnecessary reloading of rules.
You can create a main rule set that contains =LoadRules() functions for the other rule sets. Then those other rule sets are loaded only when they have actually changed. This is because the other rule sets are now arguments to the LoadRules() function, so Excel only calculates the LoadRules() function when the rules actually change.
Assume you call that main rule set MainRules, and the bulk of your application logic is in PriceRules and DiscountRules. MainRules now looks like this:
| MainRules |
| =LoadRules(PriceRules) |
| =LoadRules(DiscountRules) |
| * |
| PriceRules |
| .price = ... |
| ... |
| * |
There is one problem. You would like your queries to reference the main rule set like this:
| =RQuery(MainRules, "FIND .price") |
But .price is defined in the PriceRules rule set.
There are two solutions to the problem.
The quickest is to simply have MainRules inherit from PriceRules:
| MainRules |
| Inherit from PriceRules |
| =LoadRules(PriceRules) |
| =LoadRules(DiscountRules) |
| * |
This illustrates the difference between Excel dependencies and logical rule set dependencies. MainRules now has both:
Now, when a rule in PriceRules changes, the LoadRules(PriceRules) function will get recalculated (loading the new rules), and MainRules will then be marked as changed, so the =RQuery(MainRules, ...) will get recalculated. This is exactly the Excel behavior we want.
Also, the query =RQuery(MainRules, "FIND .price") will start to look for .price in MainRules, fail to find it, and then follow the inheritance chain finding the .price rule in PriceRules. This is exactly the logical behavior we want.
But for this situation, I don't prefer the inheritance because it is not intuitive to have the main rule set inherit from the others. I'd rather see it the other way around.
A more intuitive approach is to have MainRules pass queries down to appropriate rule sets. This has an additional advantage of defining the external interface to the rules of an application, while hiding all of the details of logic rules used to determine .price.
Do do this you create rules in MainRules that redirect the queries:
| MainRules |
| =LoadRules(PriceRules) |
| =LoadRules(DiscountRules) |
| .price = PriceRules:.price |
| * |
Now the query =RQuery(MainRules, "FIND .price") works, but this time because the MainRules .price rule calls the PriceRules .price rules.
When developing a complex rule set, it is often convenient to have a number of =RQuery() cells used for debugging. These queries look for intermediate results, to see if your rules are working as you would like. If you've got a lot of these, they will all be taking time to execute whenever the inputs change.
For example, in a pricing application you might have a number of queries like this during development:
| =RQuery(MainRules, "FIND .price[widget]" |
| =RQuery(MainRules, "FIND .discount") |
| =RQuery(MainRules, "FIND .age_discount") |
You can provide one general purpose RQuery cell like this:
| A | B | |
| 1 | Query: | .age_discount |
| 2 | Answer: | =RQuery(MainRules, "FIND " & B1) |
Now you can put properties in B1 and see what their values are in B2.
If you still want to have multiple diagnostic =RQuery() cells, you can then simply edit out the = to turn them on or off.
Often you will have multiple queries that generate different results from the same inputs. For example, we might want to present the price and discount in a pricing application:
| A | B | |
| 1 | Discount: | =RQuery(MainRules, "FIND .discount") |
| 2 | Price: | =RQuery(MainRules, "FIND .price") |
In this case, both .discount and .price will be based on the same inputs, yet each query will start from the beginning, calculating intermediate results as it goes. For example, each of the two properties above might depend on another property, .age_discount. .age_discount will be recalculated for each query.
You can link the two queries together, so that they share derived results by having one query depend on the other, rather than on the MainRules rule set. This way, .age_discount will only be calculated by the first query.
To do this, instead of having the first argument of RQuery point to a rule set, you have it point to a query:
| A | B | |
| 1 | Discount: | =RQuery(B2, "FIND .discount") |
| 2 | Price: | =RQuery(MainRules, "FIND .price") |
Now, because of Excel dependencies, the query in B2 will get done first, and then the query in B1. The query in B1, because it uses a query rather than a rule set, will reuse any intermediate results from the first query. So, in our example, .age_discount will only be calculated once.
Of course, you can turn off Excel recalculation using the tools/options menu while you are developing rule sets, to avoid constant reloading and recalculation.
If a rule set has a cell with =RTable() in it, then whenever the data changes, the RTable() cell changes, and, due to Excel dependencies, the rule set it is in needs to be reloaded.
| PriceRules |
| =RTable(".inputs", InputRange, True, False) |
| .price = ... |
| ... |
| * |
So in this example, whenever the data in the range InputRange changes, PriceRules is reloaded.
If you are using the data for input and it is changing frequently, and the rule set is large, then it is probably a good idea to separate the RTable() function call.
Often this architecture fits in well with the MainRules design mentioned above. I find it's best to have the main rules gather the data as well as the other rule sets. You can then use cover functions to get to the working rule sets, and use inheritance from them to get back to the data in MainRules:
| MainRules |
| =LoadRules(PriceRules) |
| =LoadRules(DiscountRules) |
| =RTable(".inputs", InputRange, True, False) |
| .price = PriceRules:.price |
| * |
| PriceRules |
| Inherit from MainRules |
| Inherit from DiscountRules |
| .price = ... WHEN ... .inputs[ ... |
| ... |
| * |
This way the RQuery()s can refer to MainRules, and the rules in PriceRules can get the input data that was loaded in MainRules. PriceRules is only reloaded when the actual rules change.
Please do not reply to this newsletter as responses are not read. To contact us, visit http://www.arulesxl.com/company/contact.htm. To unsubscribe, visit http://www.arulesxl.com/mail/mailinglist.htm.
ARulesXL is a trademark of Amzi! inc. Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries.