ARulesXL Newsletter

20 February 2006

Contents


Rules 101

So you've started Excel and looked at some of the ARulesXL samples or read through the tutorial, but you are wondering how to add rules to one of your own spreadsheets. This introduction will describe the process a step at a time.

There are three basic building blocks for a rule set:

The process of building a rule set is an iterative one. In each iteration you determine and refine the outputs, inputs and rules, then you write more rules and test them. So the steps are:

  1. Identify the Outputs
  2. Determine the Inputs
  3. Write the Rules Naturally
  4. Determine How You will Use the Outputs
  5. Determine How You will Get the Inputs
  6. Write the First Rules
  7. Test the First Rules
  8. Write More Rules
  9. Test More Rules

1. Identify the Outputs

The first step is to determine what you want the result of your rules to be. Here are some possibilities:

As you can see the range of results can be a simple, single value; a set of interdependent values; or a full-text explanation. ARulesXL can produce all of these.

As an example, suppose we want to advise on what type of shaft to install in a particular golf club (see the 'Product Advice.xls' sample in the beta test release). The output for this system is:

For a loan application the output is 'approved' or 'denied' and an explanation if the loan was denied. For a vaccination analysis, the output is an analysis of whether or not each vaccination given thus far was correct, and a plan for future vaccinations. For an expense report review, the output highlights which expenses did not meet the guidelines along with an explanation for each.

2. Determine the Inputs

The next step is to make a list of the things that you need to know in order to generate the output identified in step 1. Don't worry about identifying every last one. Start with the primary inputs that come to mind. This list will expand (and contract) as you write your rules (later). Also, while you write your rules, you might find inputs that you can calculate (infer) and don't need to obtain from the user or other spreadsheet cells.

Like outputs, inputs can be simple values like answers to questions; or lists; or n-dimensional arrays of values.

For our golf shaft advisor, the inputs are:

The inputs for a vaccination advisor are the age of the child and a list of the vaccinations received each with a date administered and vaccine name. The inputs for loan application are the price of the property, the applicant's credit rating and monthly income, the down payment and the like. The inputs for product pricing are a list of items and the quantity for each one. The inputs for a workflow system are a list of services to be provided or tasks to be accomplished.

3. Write the Rules Naturally

The easiest way to start writing rules is to think of them as a set of goals. The top-level goal is to produce the output(s) identified in step 1. For example, in our shaft advisor, the goal is to provide 'advice' containing:

shaft flex and tip stiffness

If this is your first rule set, you might find it easier to write your rules in plain English (French, Spanish, Chinese, etc.). Try to write them with two phrases:

The general form is:

a=1 and c=2 when x=m and/or y=n

This means that 'a' and 'c' are goals and they are assigned the values 1 and 2 when the 'x' and 'y' goals already have the values m and/or n. Here are some examples:

unit price = 5 when quantity < 10
price = unit price * quantity
status = denied when credit rating < 100
next vaccine = birthdate + 6 months when vaccine count = 0

Goals are the heart of a rule set. When a rule set is queried, it tries to find the value of a goal. To do that, it has to find the values of all the sub-goals for that goal. So for the shaft advisor, our primary goal is to provide advice, and the advice consists of shaft flex and tip stiffness. In order to get a value for advice, the rule engine must get a value for shaft flex and tip stiffness.

               advice
              /      \
      shaft flex    tip stiffness

For the shaft advisor, our first rule in plain English is:

the advice is the recommended shaft flex and recommended tip stiffness

Note this rule is comprised of two sub-goals, shaft flex and tip stiffness. Now we need some rules for them. These rules all use inputs from the user after the when.

shaft flex is L when swing speed < 75
shaft flex is A when swing speed > 75 and < 85
shaft flex is R when swing speed > 85 and < 95
shaft flex is S when swing speed > 95 and < 105

shaft tip is soft when using high degree driver and the player wants a normal to high ball flight
shaft tip is stiff when using a driver < 11 degrees and the players wants a low to normal ball flight
shaft tip is stiff when the player wants accuracy over distance
shaft tip is soft when the player wants distance over accuracy

Write as many rules as come to mind easily. Don't worry about getting the words exactly right, just try to follow the general form.

4. Determine the How You will Use the Outputs

Before you can actually create a rule set, you need to figure out how you are going to use the output from the rule queries. You could query for one goal per spreadsheet cell. For the golf shaft advisor, we could return the shaft flex in cell B2 and shaft stiffness in cell B3. Those values could then be used by other spreadsheet formulas and functions. This form of output is useful for any rule set that returns numeric values, like prices, for totalling.

Another option is to return a sentence or paragraph for the spreadsheet user to read. That's what the shaft advisor does. The top-level goal is called '.advice', which is the name of an object in ARulesXL. An object name begins with a period (.) and it has a value associated with it. The value can be a number, word, sentence or entire document. The latter forms are not useful in a spreadsheet formulas or functions, but they are very good for regulatory or legal applications.

You can also return an array of values (this is called an array query in Excel). In this case your goals would need to be an array. In ARulesXL an array object uses square brackets around the indices. For the shaft advisor sub-goals, we use an array object consisting of two elements:

.shaft[flex]
.shaft[tip]

ARulesXL lets you use words or number as indices, which makes the rules easier to read. We could have used a separate object for each goal, for example, .shaft_flex and .shaft_tip.

In sophisticated applications you can return a multi-dimensional array of values (in essence a table). A vaccination advisor returns a table that analyzes each vaccination given to date, and another table that is a schedule of vaccinations to give in future.

Since rule sets can have multiple goals, you might return different types of values for each goal. For example, a loan application returns a simple yes/no value for whether or not the application is approved, and then a sentence or two explaining if it was denied.

5. Determine How You will Get the Inputs

Inputs come from the spreadsheet user entering values into cells, or from the results of spreadsheet formulas. If the user is responding to a list of questions, the answers can be easily brought into the rule set using the RTable() function. For the shaft advisor, this is done as follows:

=RTable(".in", A2:B5, TRUE, FALSE, TRUE)

where cells A2 to B5 contain the following values:

 
A
B
2 Swing Speed 77
3 Club Type Fairway Wood
4 Favor Accuracy
5 Ball Flight Normal

This creates an array object named '.in'. The first TRUE says the table has row headers (e.g. Swing Speed, Favor). The FALSE says the table does not have column headers in the first row (in other words, the first row contains data values). The last TRUE says we want this as a 1-dimensional vector. So the result of this RTable() is:

.in['Swing Speed'] =      77
.in['Club Type'] = 'Fairway Wood'
.in[Favor] = Accuracy
.in['Ball Flight'] = Normal

Note, names and values with spaces in them are quoted.

In addition to RTable(), you can use the RCell() function to get a value from a spreadsheet cell. For example

=RCell(".discount", D8)

sets the .discount object to the value in spreadsheet cell D8.

RTable() and RCell() functions must be inside a rule sets. The name of the rule set in the shaft advisor is 'ShaftRules'. The ARulesXL menu contains commands to create, rename and manage rule sets.

If you want to check if the inputs in the shaft advisor were set correctly, you can type this rule set query function in any cell:

=RQuery(ShaftRules, "FIND .in['Club Type']")

this should return the value 'Fairway Wood'.

You can also use RTable() to incorporate tables of values in your rules. The 'Product Advice.xls' spreadsheet that this introduction is based on does exactly that to determine the shaft flex from the swing speed.

6. Write the First Rules

The first rule returns the value for one of your goals. If you query your rule set for multiple values, then you start by writing a rule for each of those goals. For the shaft advisor, the top-level goal is:

the advice is the recommended shaft flex and recommended tip stiffness

In ARulesXL, we write that as follows:

.advice = "Flex is " & .shaft[flex] & " Tip is " & .shaft[tip]

This rule reads as "the advice is a text string that includes the recommended shaft flex and the recommended shaft tip stiffness ". Notice the advice goal requires finding values for two sub-goals: the flex and the tip. The next step is to write a couple of rules for each of the sub-goals. Referring back to our plain English rules:

shaft flex is L when swing speed < 75
shaft flex is A when swing speed > 75 and < 85

shaft tip is stiff when the player wants accuracy over distance
shaft tip is soft when the player wants distance over accuracy

The rules are written as follows:

.shaft[flex] = L WHEN .in['Swing Speed'] <= 75
.shaft[flex] = A WHEN .in['Swing Speed'] > 75 and .in['Swing Speed'] <= 85

.shaft[tip] = Stiff WHEN .in[Favor] = Accuracy
.shaft[tip] = Soft WHEN .in[Favor] = Distance

Note on the left side we are setting the value of one of our sub-goals (flex or tip stiffness), and the conditions (after the WHEN) refer to the user inputs loaded by RTable().

Some rule sets have multiple top-level goals. The value for a top-level goal is retrieved by using the RQuery() function.

7. Test the First Rules

In the shaft advisor, we test the rules with this query:

=RQuery(ShaftRules, "FIND .advice")

This says query the 'ShaftRules' rule set and set the value of the current cell to the value of the .advice object. You should get:

Flex is A Tip is Stiff

if your inputs are set as shown above. When testing rule sets, change the inputs to get different results to test all the rules. Notice that the rules are reapplied when the input data changes due to Excel's spreadsheet recalculation.

Let's examine how the query actually works with a very simple example that contrasts traditional procedural execution (an ordered set of steps like Excel spreadsheet recalculation) with how a rule engine operates. Take the following steps:

a = 2
b = 3
c = a + b

In a procedural system, the steps run in the order given. First 'a' is set to 2. Then 'b' is set to 3. Then 'c' is set to 5. However, if these are rules you could put them in this order (or any order for that matter):

c = a + b
a = 2
b = 3

In a rule set you query for the value of 'c'. The rule engine finds a rule that can provide a value for 'c' (c = a + b) and sees that it needs to get a value for 'a'. So it looks for a rule that provides that value (a = 2). Next, it needs a rule that provides a value for 'b' (b = 3). Finally, since now it know the values of 'a' and 'b', it can calculate the value of 'c'. This is the essence of the process of querying rules (inferencing).

Let's look at how the first rules in the shaft advisor work. The query is to find a value for .advice. The rule engine does the following:

  1. Find a rule that returns a value for .advice. There is only one:
    .advice = "Flex is " & .shaft[flex] & " Tip is " & .shaft[tip]
  2. Find the value for .shaft[flex]. To do this:
    1. Check if the value is known for .shaft[flex]. The known list is:
      .in['Swing Speed'] =      77
      .in['Club Type'] = 'Fairway Wood'
      .in[Favor] = Accuracy
      .in['Ball Flight'] = Normal
      All these values came from the RTable() function. The value for .shaft[flex] is not known.
    2. Find a rule that returns a value for .shaft[flex]. The first one is:
      .shaft[flex] = L WHEN .in['Swing Speed'] <= 75
    3. Check if the value is known for .in['Swing Speed']. It is 77 (from RTable()).
    4. Test the conditions to see if 77 <= 75. It is not.
    5. Find another rule that returns a value for .shaft[flex]. The next one is:
      .shaft[flex] = A WHEN .in['Swing Speed'] > 75 and .in['Swing Speed'] <= 85
    6. Check if the value is known for .in['Swing Speed']. It is 77 (from RTable()).
    7. Test the conditions to see if 77 > 75 and <= 85. It is.
    8. Set the value of .shaft[flex] to A.
  3. Find the value for .shaft[tip]. To do this:
    1. Check if the value is known for .shaft[tip]. The known list is:
      .shaft[flex] = A
      .in['Swing Speed'] =      77
      .in['Club Type'] = 'Fairway Wood'
      .in[Favor] = Accuracy
      .in['Ball Flight'] = Normal
      The value for .shaft[tip] is not known.
    2. Find a rule that returns a value for .shaft[tip]. The first one is:
      .shaft[tip] = Stiff WHEN .in[Favor] = Accuracy
    3. Check if the value is known for .in[Favor]. It is Accuracy (from RTable()).
    4. Test the conditions to see if Accuracy = Accuracy. It does.
    5. Set the value of .shaft[tip] to Stiff.
  4. Set the value for .advice to the values of .shaft[flex] and .shaft[tip] along with the text strings.

If you are unfamiliar with how rules work, take a close look at the ARulesXL debugger (on the ARulesXL menu). It will help you understand how the rule engine finds and matches rules. Once you understand how this work (the process is called inferencing), it will become easier to write more rules.

8. Write More Rules

Now we can write additional rules for the sub-goals .shaft[flex] and .shaft[tip]. These additional rules will use more of the inputs identified in steps 2 and 5. When writing new rules, you will probably find new inputs you need to add to your rule set.

The order of the rules in your rule set may affect the results. In general, you want to put the more specific rules first and the defaults last. When the rule engine looks for a rule for a particular sub-goal, it starts with the first one in the rule set, and tries to match the conditions on the WHEN side. So the rules for .shaft[tip] are ordered from the ones with the most conditions to the ones with the least as follows:

.shaft[tip] = Soft WHEN .in['Club Type'] = 'Driver < 11 Degrees' AND .in['Ball Flight'] = High OR .in['Ball Flight'] = Normal
.shaft[tip] = Stiff WHEN .in['Club Type'] = 'Driver < 11 Degrees' AND .in['Ball Flight'] <> High
.shaft[tip] = Soft WHEN .in['Club Type'] = 'Driver < 11 Degrees' AND .in[Favor] = Distance
.shaft[tip] = Stiff WHEN .in['Club Type'] = 'Driver >= 11 Degrees'
.shaft[tip] = Stiff WHEN .in['Club Type'] = 'Hybrid/Utility' OR .in[Favor] = Accuracy
.shaft[tip] = Soft WHEN .in['Club Type'] <> 'Hybrid/Utility' OR .in[Favor] = Distance
.shaft[tip] = Stiff WHEN .in[Favor] = Accuracy
.shaft[tip] = Soft WHEN .in[Favor] = Distance

Hence the latter rules are more general and provide default values when more specific cases do not apply.

9. Test More Rules

Everytime you add a new set of rules, test them by changing your inputs. If you do not understand the results, use the ARulesXL debugger to learn how your rules are selected and matched.

Also your system may grow in complexity with additional sub-goals. For example, our shaft advisor could recommend shaft bend point, weight and material. This would add three additional sub-goals onto the main advice goal.

For more complex systems you might have many sub-goals. And each sub-goal may itself have additional sub-goals. These complex systems are where rule technology really shines because the ARulesXL engine determines which sub-goals to process and the order to process them based on the inputs. Each time the inputs change, a different set of rules are used, often in a different order. But, you don't have to worry about that. This makes it easier to write and maintain rules as compared to technologies that do their processing step-by-step.


We invite all feedback on this 'Rules 101' introduction. Please let us know how we can improve it.

 


www.arulesxl.com

ARulesXL is a trademark of Amzi! inc. Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries.