This is a conceptual issue that requires many articles and textbooks to explain. However, as explained in the comments, I am trying to elaborate on extracting attributes. Functional descriptors must be robust against scaling, translation, and rotation. This reliability is literally called invariant functions. For example, moments and their derivatives are one of the most famous types of invariants against rotation, scaling, and translation. You can find the use of Hu moments as described in this document . Detecting a flame or fire is something else. The function corresponds to a flame that can be extracted from the dynamic texture of the fire. For example, a fire has a special color texture that makes it stand out from the background. Conventional flame sensors use infrared sensors to detect flame. In image processing or in the RGB world, we can do the same by considering the nature of the flame itself. Flames radiate a significant portion of their energy through heat and infrared rays. Thus, it can be expected that most of the red channel will be dedicated to the flame. See the following image, for example:

In the processed image, the red channel is converted to the BW image by entering a threshold value. To be more clear, I have divided 3 channels as shown below.
R:
G:
B: 
Obviously, the red channel says more about the flame. Therefore, we can conclude that the flame is where the R-channel has a part of its information, and then G and, finally, the B-channel. See this .
Thus, your vector function will be a three-dimensional vector, for example, for a flame path in three RGB channels. The SVM classifiers will now be ready for use. Sometimes a video may contain segments similar to flames that should be avoided, and otherwise they will lead to false alarms. SVM helps you accept or reject a candidate from a flame. To train your vector support machine, collect some true flames and some images that may be underestimated by your extractor. Then call them positive and negative traits. Finally, let opencv do the magic and train it. For more information on SVM, please watch this video Patrick Winston, Massachusetts Institute of Technology, youtube .
UPDATE ---- Since you are interested in creating vector objects, I gave you the following example. Suppose that the channels R, G, B are precisely divided, so that they can be called statistically independent, as the following; This is not true in real images in which the R, G, B planes are not statistically independent. 
Therefore, a point in the RGB image will have 3 representations in the RGB channels. For example, the flame will make 3 spots on all planes R, G, B. Here, for example, the area of ββeach spot is traced. Name the flame spot in the RGB image as "A". 
Representations of region A were depicted above in images R, G, B. A_r, A_g, A_b denote the corresponding region of region A on the planes R, G, B, respectively.
Therefore, point A will be represented by a triplet (Ar, Ag, Ab) in the xyz plane. SVM now accepts this vector as input and decides if this means real flame.
Regions, a normalized format, are one of many geometric elements that you can use in your decision making process. Other useful features of this kind are aspect ratios, moments, etc.
In short, you should do the following:
1 - Find places similar to fire.
2 - Trace the place of the candidate in all planes R, G, B.
3 - Extract a function (I suggest moments) in each plane.
4 - Generate a vector function
5 - Serve SVM with this vector
I hope you find this helpful.