Implementing 3D vectors in Python: numpy vs x, y, z fields

I am implementing the 3D Vector class in Python. My vector has x, y, and z coordinates (everything floats), and I need to decide how to store this information. I see at least three options here:

1) Make three separate float fields: self.x, self.y, self.z

class Vector: def __init__(self, x, y, z): self.x = x self.y = y self.z = z 

2) Make a list, say self.data, with three elements. I can also use a tuple if the objects can be persistent.

 class Vector: def __init__(self, x, y, z): self.data = [x,y,z] 

3) Create a numpy array, say self.data, with three elements.

 import numpy as np class Vector: def __init__(self, x, y, z): self.data = np.array([x,y,z]) 

For parameters (2) and (3), I could then implement properties and setters to access single coordinates

 @property def x(self): return self.data[0] 

4) Why not have some redundancy? I could have both a list (or a tuple, or a numpy array), and separate fields x, y and z.

The class is designed to perform general operations, such as addition of vectors, scalar product, cross-product, rotation, etc. These operations must be considered.

Is there a solution I should prefer and why?

+5
source share
3 answers

There are various aspects to this issue, and I can give you some tips on how they can be resolved. Please note that this is meant as suggestions; you definitely need to see which one you like best.

Supporting Linear Algebra

You mentioned that you want to support linear algebra, such as adding vectors (adding by elements), cross-product, and scalar product. They are available for numpy.ndarray , so you can choose different approaches to support them:

  • Just use numpy.ndarray and don't worry about your class:

     import numpy as np vector1, vector2 = np.array([1, 2, 3]), np.array([3, 2, 1]) np.add(vector1, vector2) # vector addition np.cross(vector1, vector2) # cross product np.inner(vector1, vector2) # inner product 

    There is no built-in rotation vector defined in numpy , but there are several sources, such as "3D Rotation" . Therefore, you will need to implement it yourself.

  • You can create a class no matter how you store your attributes and provide the __array__ method. This way you can support (all) numpy functions as if your instances were numpy.ndarray yourself:

     class VectorArrayInterface(object): def __init__(self, x, y, z): self.x, self.y, self.z = x, y, z def __array__(self, dtype=None): if dtype: return np.array([self.x, self.y, self.z], dtype=dtype) else: return np.array([self.x, self.y, self.z]) vector1, vector2 = VectorArrayInterface(1, 2, 3), VectorArrayInterface(3, 2, 1) np.add(vector1, vector2) # vector addition np.cross(vector1, vector2) # cross product np.inner(vector1, vector2) # inner product 

    This will return the same results as in the first case, so you can provide an interface for numpy functions without using a numpy array. If you have a numpy array stored in your class, the __array__ method can simply return it so that it can be an argument to store your x , y and z as numpy.ndarray internally (because it is basically "free").

  • You can subclass np.ndarray . I will not go into details here because it is an advanced topic that can easily justify the whole answer on its own. If you are really considering this, you should take a look at the official documentation for "Subclassing ndarray . " I do not recommend it, I have worked on several classes that subclass np.ndarray , and there are several “rough egdes” along the way.

  • You can independently carry out the operations that you need. It reinvents the wheel, but it is educational and fun - if there are only a few of them. I would not recommend this for serious production, because there are also a few "rough edges" that have already been added to the numpy functions. For example, problems with overflow or thread, function correctness, ...

    A possible implementation (without rotation) might look like this (this time with an internally saved list):

     class VectorList(object): def __init__(self, x, y, z): self.vec = [x, y, z] def __repr__(self): return '{self.__class__.__name__}(x={self.vec[0]}, y={self.vec[1]}, z={self.vec[2]})'.format(self=self) def __add__(self, other): x1, y1, z1 = self.vec x2, y2, z2 = other.vec return VectorList(x1+x2, y1+y2, z1+z2) def crossproduct(self, other): x1, y1, z1 = self.vec x2, y2, z2 = other.vec return VectorList(y1*z2 - z1*y2, z1*x2 - x1*z2, x1*y2 - y1*x1) def scalarproduct(self, other): x1, y1, z1 = self.vec x2, y2, z2 = other.vec return x1*x2 + y1*y2 + z1*z2 

    Note. You can implement these encoded methods and implement the __array__ method that I mentioned earlier. This way you can support any function that numpy.ndarray , as well as have your own homegrown methods. These approaches are not exclusive, but you will have different results, the methods above return a scalar or Vector , but if you go through __array__ you will get numpy.ndarray back.

  • Use a library containing a 3D vector. In a sense, this is the easiest way in other aspects, which can be very complex. On the plus side, the existing class is likely to work out of the box and is probably optimized in terms of performance. On the other hand, you need to find an implementation that supports your use case, you need to read the documentation (or find out how it works in other ways), and you can prejudice errors or restrictions that turn out to be empty for your project, Ah, and you get additional dependency, and you need to check if the license is compatible with your project. In addition, if you copy the implementation (check if the license is allowed!), You need to support (even if it just synchronizes) the external code.

Performance

Performance in this case is complicated, the mentioned use cases are quite simple and each task should be of the order of microseconds, so you should be able to perform from several thousand to millions of operations per second. Assuming you are not introducing an unnecessary bottleneck! However, you can optimize it with a microprocessor.

Let me start with some general tips:

  • Avoid numpy.ndarraylist / float operations. It is expensive! If most operations use numpy.ndarray , you do not want to store your values ​​in a list or as separate attributes. Similarly, if you want to access individual Vector values ​​or iterate over these values ​​or perform operations on them as a list , save them as a list or separate attributes.

  • Using numpy to work with three values ​​is relatively inefficient. numpy.ndarray great for a large array, because it can store values ​​more efficiently (space) and scales much better than pure-python operations. However, these advantages have some overhead, which is significant for small arrays (say length << 100 , which is an educated guess, not a fixed number!). The python solution (I use the one I already presented above) can be much faster than the numpy solution for such small arrays:

     class VectorArray: def __init__(self, x, y, z): self.data = np.array([x,y,z]) # addition: python solution 3 times faster %timeit VectorList(1, 2, 3) + VectorList(3, 2, 1) # 100000 loops, best of 3: 9.48 µs per loop %timeit VectorArray(1, 2, 3).data + VectorArray(3, 2, 1).data # 10000 loops, best of 3: 35.6 µs per loop # cross product: python solution 16 times faster v = Vector(1, 2, 3) a = np.array([1,2,3]) # using a plain array to avoid the class-overhead %timeit v.crossproduct(v) # 100000 loops, best of 3: 5.27 µs per loop %timeit np.cross(a, a) # 10000 loops, best of 3: 84.9 µs per loop # inner product: python solution 4 times faster %timeit v.scalarproduct(v) # 1000000 loops, best of 3: 1.3 µs per loop %timeit np.inner(a, a) # 100000 loops, best of 3: 5.11 µs per loop 

    However, as I said, these timings are in the order of microseconds, so this is literally microoptimization. However, if your focus is on the optimal performance of your class , you can be faster with pure pythons and self-fulfilling functions.

    Once you have tried many linear algebra operations, you should use numpys vectorization operations. Most of them are incompatible with the class you are describing, and a completely different approach can be applied: for example, a class that stores an array of vector arrays (a multidimensional array) in a way that correctly interacts with numpys functions! But I think that outside the scope of this answer, I would not really answer your question, which was limited to a class in which only 3 values ​​were stored.

  • I did some tests using the same method with different approaches, but a bit of a trick. In general, you should not use a single function call, you should measure the execution time of the program . In programs, the tiny difference in speed in a function called millions of times can significantly increase the overall difference than the big difference in speed in a method that is called only a few times .... or not! I can only provide timings for functions, because you did not share your program or use cases, so you needed to find out which approach works best (correctness and performance) for you.

Conclusion

There are several other factors to consider which approach would be best, but it is more “meta" - reasons not directly related to your program.

  • Re-inventing the wheel (implementing functions yourself) is an opportunity to learn. You have to make sure that it works correctly, you can time, and if it is too slow, you can try various ways to optimize it. You start thinking about algorithmic difficulties, constant factors, correctness ... instead of thinking about “which function will solve my problem” or “how can I make this numpy function solve my problem correctly”.

  • Using NumPy for arrays with a length of-3 is probably similar to “shooting with guns at flies,” but this is a great opportunity to become more familiar with numpy functions, and in the future you will learn more about how NumPy works (vectorization, indexing, translation , ...), even if NumPy is not suitable for this question and answer.

  • Try different approaches and see how far you will reach. I learned a lot by answering this question, and it was very interesting to try the approaches - compare the results for the discrepancies, determine the time of the method call and evaluate their limitations!

+4
source

Given the use of the Vector class, I would rather have option-3. Since it gives numpy arrays, vector operations are relatively easy, intuitive and fast using numpy.

 In [81]: v1 = Vector(1.0, 2.0, 3.0) In [82]: v2 = Vector(0.0, 1.0, 2.0) In [83]: v1.data + v2.data Out[83]: array([1.0, 3.0, 5.0]) In [85]: np.inner(v1.data, v2.data) Out[85]: 8.0 

These operations are already well optimized in numpy mode for performance.

+1
source

If a simple vector type of behavior is your goal, be sure to stick with a clean numpy solution. There are many reasons for this:

  • numpy already has ready-made solutions for all the basic behaviors that you describe (cross-products and much more).
  • it will be faster in jumps and restrictions for arrays of significant size (i.e. where it matters)
  • vectorized / array syntax tends to be much more compact and expressive as soon as you get used to / experience with it.
  • and the most important thing; the whole numpy / scipy system is built around the interface provided by ndarray; all libraries speak the common language ndarray; interacting with them with your usual vector type enters the world of pain.
+1
source

Source: https://habr.com/ru/post/1267017/


All Articles