What you do is extract bottlenecks from the images that you submit to the model. The form (496, 4, 4, 512) that you get is (n_samples, feature_height, feature_width, feature: channels) You pulled out the dense layers of the model, skipping
include_top=False
,

4 . ( , 150x150, 224x224, VGG16).
, , - , .
, , ,
model = applications.VGG16(include_top=False, weights='imagenet')
for layer in model.layers:
layer.trainable = False
model = Dense(512, activation='relu')(model)
model = Dense(number_of_classes, activation='softmax')(model)
model.fit(X, Y) , , , X 496 Y, .
model.predict , .