Decision trees are very flexible, understandable, and easy to debug. They will work with classification problems and regression problems. Therefore, if you are trying to predict a categorical value, for example (red, green, up, down), or if you are trying to predict a continuous value, for example 2.9, 3.4, etc. Decision trees will handle both problems. Probably one of the coolest things about Decision Trees - they only need a data table, and they will build a classifier directly from this data, without requiring any preliminary design work. To some extent, properties that do not matter will not be selected as splits and will ultimately be trimmed, so it will be very tolerant of nonsense. To run it, install it and forget it.
However, the disadvantage. Simple decision trees are generally more suitable for training data, so there are other methods that mean you usually have to prune trees and set up pruning procedures. You did not have any upfront design costs, but you will pay for this to tune tree performance. Also, simple decision trees divide the data into squares, so creating clusters around things means that it has to split a lot to cover data clusters. Splitting a lot leads to complex trees and increases the likelihood of overflow. Tall trees are pruned, so when you can create a cluster around some function in the data, it may not survive the pruning process. There are other methods, such as surrogate splits , which allow you to split into several variables at once, creating partitions in a space that is neither horizontal nor perpendicular (0 <slope <infinity). Cool, but your tree is starting to get harder to understand, and its complex for implementing these algorithms. Other methods, such as acceleration and random decision tree trees, may work reasonably well, and some feel that these methods are necessary to get the best performance from decision trees. Again, this adds more things to understand and use to customize the tree, and therefore more things to implement. In the end, the more we add to the algorithm, the higher the barrier to its use.
Naive Bayes requires you to build the classification manually. There is no way to just drop a bunch of tabular data on it and choose the best functions that it will use for classification. The choice that matters is up to you. Decision trees select the best features for you from tabular data. If Naive Bayes had a way to select functions, you would approach the same methods as decision trees. Give this fact, which means you may need to combine Naive Bayes with other statistical methods to help you determine which functions are best classified and which decision trees can use. Naive bays will respond as a continuous classifier. There are methods that allow you to adapt it to categorical forecasting, but they will respond in terms of probabilities, such as (A 90%, B 5%, C 2.5% D 2.5%). Bayes can work quite well, and it does not fit almost as much so there is no need to reduce or process the network. This simplifies the implementation of the algorithms. However, they are harder to debug and understand, because all the probabilities are multiplied 1000 times, so you have to be careful to check that it does what you expect. Naive bays do great when training data does not contain all the features, so this can be very good with low data volumes. Decision trees work better with more data than Naive Bayes.
Naive Bayes is used a lot in robotics and computer vision, and he copes with these tasks perfectly. Decision trees work very poorly in these situations. Teaching a decision tree to recognize poker hands while looking at millions of poker hands is very bad because royal flushes and ATVs are so few that they are often pruned. If it is trimmed from the resulting tree, it will not correctly classify these important hands (remember the discussion of tall trees from above). Now just think if you are trying to diagnose cancer using this. Cancer does not occur in large numbers in the population, and it will be reduced more likely. The good news is that this may be due to the use of weights, so we weigh the winning hand or cancer as higher than losing the hand or not cancer, and this raises it to a tree, so it will not be reduced. Again, this is part of setting up the resulting tree in the situation that I talked about earlier.
Decision trees are neat because they tell you which inputs are the best predictors of outputs, so often decision trees can help you find out if there is a statistical relationship between a given input and output, and how strong these relationships are. Often the resulting decision tree is less important than the relationships it describes. Thus, decision trees can be used as a research tool when you learn about your data so that you can create other classifiers.
If you are solving a problem between using decision trees and naive bikes to solve a problem, it is often better to test them. Create a decision tree and build a naive classifier for the gulfs, then take the survey using training and validation data. Which ever works best will probably work better in this area. And it is always a good idea to throw each of them against the predictors of the K-nearest neighbor (KNN), since it was shown that the k-nearest of them performs both of them in some situations, and KNN is a simple algorithm for implementation and use. If KNN works better than the other two, go to it.
Some sources:
CART-based decision tree guide. These books cover the CART algorithm, but also discuss decision trees, weights, missing values, surrogate splits, boosts, etc. http://www.amazon.com/Classification-Regression-Wadsworth-Statistics-Probability/dp/0412048418
A softer introduction to CART https://www.youtube.com/watch?v=p17C9q2M00Q
Algorithm Comparison - Please note that KNN, decision trees, C4.5 and SVM do pretty well with most tests. http://www4.ncsu.edu/~arezaei2/paper/JCIT4-184028_Camera%20Ready.pdf
Another comparison of algorithms is enhanced resolution trees and random tops of the list with KNN in the middle: http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf
Another good experience with various methods: http://www.quora.com/What-are-the-advantages-of-different-classification-algorithms