
上QQ阅读APP看书,第一时间看更新
TensorForest Estimator
TensorForest is a highly scalable implementation of random forests built by combining a variety of online HoeffdingTree algorithms with the extremely randomized approach.
Google published the details of the TensorForest implementation in the following paper: TensorForest: Scalable Random Forests on TensorFlow by Thomas Colthurst, D. Sculley, Gibert Hendry, Zack Nado, presented at Machine Learning Systems Workshop at the Conference on Neural Information Processing Systems ( NIPS) 2016. The paper is available at the following link: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxtbHN5c25pcHMyMDE2fGd4OjFlNTRiOWU2OGM2YzA4MjE.
TensorForest estimators are used to implementing the following algorithm:
Initialize the variables and sets
Tree = [root]
Fertile = {root}
Stats(root) = 0
Splits[root] = []
Divide training data into batches.
For each batch of training data:
Compute leaf assignment for each feature vector
Update the leaf stats in Stats
For each in Fertile set:
if |Splits| < max_splits
then add the split on a randomly selected feature to Splits
else if is fertile and |Splits| = max_splits
then update the split stats for
Calculate the fertile leaves that are finished.
For every non-stale finished leaf:
turn the leaf into an internal node with its best scoring split
remove the leaf from Fertile
add the leaf's two children to Tree as leaves
If |Fertile| < max_fertile
Then add the max_fertile − |Fertile| leaves with
the highest weighted leaf scores to Fertile and
initialize their Splits and split statistics.
Until |Tree| = max_nodes or |Tree| stays the same for max_batches_to_grow batches
More details of this algorithm implementation can be found in the TensorForest paper.