What does the sheet mean in the following xgboost model tree diagram?

enter image description here

I assume this is a conditional probability, given that the condition above (tree branch) exists. However, I do not understand.

If you want to know more about the data used or how to get this diagram, follow the link: http://machinelearningmastery.com/visualize-gradient-boosting-decision-trees-xgboost-python/

+10
source share
4 answers

The leaf attribute is the predicted value. In other words, if the tree model evaluation ends at this terminal node (aka leaf node), then this is the return value.

In the pseudo-code (the leftmost branch of your tree model):

 if(f1 < 127.5){ if(f7 < 28.5){ if(f5 < 45.4){ return 0.167528f; } else { return 0.05f; } } } 
0
source

For a classification tree with 2 classes {0,1}, the value of the final node represents the raw estimate for class 1. It can be converted into a probability estimate using the logistic function. In the calculation below, the leftmost sheet is used as an example.

 1/(1+np.exp(-1*0.167528))=0.5417843204057448 

This means that if the data point is eventually distributed across this sheet, the probability that the data point will be class 1 is 0.5417843204057448.

+10
source

You're right. These probability values ​​associated with leaf nodes represent the conditional probability of reaching leaf nodes, taking into account a particular tree branch. Tree branches can be represented as a set of rules. For example, @ user1808924 is mentioned in his answer ; one rule representing the leftmost branch of your tree model.

So, in short: the tree can be linearized in decision rules, where the result is the contents of the node sheet, and the conditions along the path form a conjunction in the if condition. In general, the rules take the form:

 if condition1 and condition2 and condition3 then outcome. 

Decision rules can be generated by constructing association rules with the target variable on the right. They may also indicate a temporal or causal relationship.

+1
source

If this is a regression model (the target could be reg: squarederror), then the sheet value is the forecast of this tree for a given data point. The sheet value may be negative depending on your target variable. The final forecast for this data point will be the sum of the leaf values ​​in all trees for that point.

If this is a classification model (the goal can be binary: logistic), then the final value is representative (for example, an unprocessed account) for the probability that the data point belongs to a positive class. The final probability forecast is obtained by taking the sum of leaf values ​​(raw points) in all trees and converting it between 0 and 1 using the sigmoid function. The sheet value (raw count) may be negative, a value of 0 actually represents a probability of 1/2.

You will find more detailed information on parameters and exits at - https://xgboost.readthedocs.io/en/latest/parameter.html

0
source

Source: https://habr.com/ru/post/1260687/


All Articles