The issue of intensification. My answer to this consists of three parts.
Disclaimer: No free lunch. Therefore, you can never be sure without checking performance on real test suite labels. In the worst case scenario, you have a drift concept in your problem that makes it impossible to predict your target class. However, there are solutions that can provide good results.
To indicate:
Functions are denoted by X target variable Y and the classifier obtained with f(X) |-> Y The distribution of X in D1 to P(X|D1) (a little misuse of the notation)
Class distribution in Testset
You suggested that you can use the distribution in the predicted variables ("check the proportions of the classes predicted from it"). This, however, can only be an indicator. I build classifiers in the industry to predict what a machine is (predictive maintenance). There are many engineers trying to make my data distorted; this makes machines that produce data more reliable. However, this is not a problem since one class basically disappears. However, classifiers are still valid.
There is a very simple way to the question "how to fix" the distribution in the target shortcuts of the test suite. The idea is to classify all test instances according to the predicted labels and sample (with replacement) of data points according to the desired distribution of the target variable. You can then try to test the distribution on X functions, but that will not tell you too much.
Can there be a skew problem? In fact, this may be because the classifier usually tries to minimize the measure of accuracy measure F1 or some other statistical property. If you know in advance about the distribution in D2 , you can provide a cost function that minimizes the costs of this distribution. These costs can be used to reprocess training data, as indicated in another answer, however, some training algorithms also contain more sophisticated methods to include this information.
Outlier detection
The question is whether it is possible to detect that something has changed in input X This is very important as it may indicate that you had incorrect data. You can apply fairly simple tests, such as mean and distribution in all dimensions. However, this ignores the dependencies between the variables.
For the following two illustrations im using aperture set 
Two techniques arise in my mind that let you discover that something in the data has changed. The first method is based on the transformation of PCA. Only for numerical, but there are similar ideas for categorical functions. PCA allows you to convert your input to lower space. this is PCA(X,t)=PCA([X1,...,Xn],t)=[Cond1,...,Condm]=Cond with projection t Where usually with n<<m this transformation is still reversible, so PCA^1(Cond,t) = X' and the error MSE(X,X') is small. To detect the problem, you can control this error, and as soon as it increases, you can say that you do not trust your forecasts.
If I create a PCA for all data from versicolor and virginica and build an error in restoring two dimensions (PCA for all aperture sizes), I get

however, if versicolor is new data, the results are less convincing.

However, a PCA (or something similar) is done for numerical data in any case, therefore, it can give good directions without much overhead.
The second technique that I know about is based on the so-called Vector One support machines. Where a machine with a conventional support medium created a classifier that tries to distinguish two target classes of Y One vector support mechanism for one class is trying to separate from invisible data. The use of these methods is quite attractive if a vector support machine is used for classification. You would get two classifications. The first reports the target data, and the second indicates whether similar data had been previously detected.
If I build the classifier of one class on setosa and virginca and the color by novelty, I get the following graph:

As you can see, the data from versicolor is suspicious. In this case, this is a new class. However, if we assume that these are examples of virginia, they drift dangerously close to the hyperplane.
Semi-Supervised Learning and Transductive
To solve your main problem. The idea of ​​Transductive Learning, a special case of instruction controlled by semi-qualified instruction, can be introverted. Semi oversees training, the training set consists of two parts. Labeled data and unlabeled data. Semi-sup-l uses all this data to create a classifier. Transductive learning is a special case where unlabeled data is your D2 test data. The idea was given by Vapnik as "do not try to solve a more complex problem [creating a classifier for all possible data] when you want to solve a simpler problem [predict labels for D2 ]"
open
RCODE for charts
ggplot(iris)+aes(x=Petal.Width,y=Petal.Length,color=Species)+geom_point()+stat_ellipse() library(e1071) iris[iris$Species %in% c("virginica","setosa"),] ocl <- svm(iris[iris$Species %in% c("virginica","setosa"),3:4],type="one-classification") coloring <- predict(ocl,iris[,3:4],decision.values=TRUE) ggplot(iris)+aes(x=Petal.Width,y=Petal.Length,color=coloring)+geom_point()+stat_ellipse() ggplot(iris)+aes(x=Petal.Width,y=Petal.Length)+geom_point(color=rgb(red=0.8+0.1*attr(coloring,"decision.values"),green=rep(0,150),blue=1-(0.8+0.1*attr(coloring,"decision.values")))) pca <- prcomp(iris[,3:4]) #pca <- prcomp(iris[iris$Species %in% c("virginica","setosa"),1:4], retx = TRUE, scale = TRUE) pca <- prcomp(iris[iris$Species %in% c("virginica","setosa"),1:4], retx = TRUE, scale = TRUE,tol=0.2) pca <- prcomp(iris[iris$Species %in% c("virginica","versicolor"),1:4], retx = TRUE, scale = TRUE,tol=0.4) predicted <-predict(pca,iris[,1:4]) inverted <- t(t(predicted %*% t(pca$rotation)) * pca$scale + pca$center) ggplot(inverted[,3:4]-iris[,3:4])+aes(x=Petal.Width,y=Petal.Length,color=iris$ Species)+geom_point()+stat_ellipse()