Effective Fly Implementation

Question

Effective Fly Implementation

Background

One of the most commonly used data structures in our application is the user-defined Point structure. Recently, we have been dealing with memory problems, mainly due to the excessive number of instances of this structure.

Many of these instances contain the same data. Sharing a single instance will greatly facilitate memory usage. However, since we use structs, instances cannot be shared. It is also impossible to change it to a class, because the semantics of the structure are important.

Our workaround for this is to have a structure containing a single reference to the support class that contains the actual data. These mutational dataclasses are stored and retrieved from the factory to ensure no duplicates.

The narrowed version of the code looks something like this:

public struct PointD { //Factory private static class PointDatabase { private static readonly Dictionary<PointData, PointData> _data = new Dictionary<PointData, PointData>(); public static PointData Get(double x, double y) { var key = new PointData(x, y); if (!_data.ContainsKey(key)) _data.Add(key, key); return _data[key]; } } //Flyweight data private class PointData { private double pX; private double pY; public PointData(double x, double y) { pX = x; pY = y; } public double X { get { return pX; } } public double Y { get { return pY; } } public override bool Equals(object obj) { var other = obj as PointData; if (other == null) return false; return other.X == this.X && other.Y == this.Y; } public override int GetHashCode() { return X.GetHashCode() * Y.GetHashCode(); } } //Public struct public Point(double x, double y) { _data = Point3DDatabase.Get(x, y); } public double X { get { return _data == null ? 0 : _data.X; } set { _data = PointDatabase.Get(value, Y); } } public double Y { get { return _data == null ? 0 : _data.Y; } set { _data = PointDatabase.Get(X, value); } } }

This implementation guarantees the preservation of the semantics of the structure, while only one instance of the same data is supported.

(Please do not mention memory leaks or such, this is simplified sample code)

Problem

Although the above approach works to reduce memory usage, performance is appalling. A project in our application can easily contain a million different points or more. As a result, finding an instance of PointData very expensive. And this search should be performed whenever Point manipulates, which, as you probably guess, is what our application is. As a result, this approach does not suit us.

As an alternative, we could create two versions of the Point class: one with flyweight support, as described above, and one containing its own data (with possible duplicates). All (short-lived) calculations could be done in the second class, while if Point saved for longer periods of time, they could be converted to the first class, which is efficient for memory. However, this means that all users of the Point class must be checked and tuned to this scheme, which is not possible for us.

What we are looking for is an approach that meets the criteria below:

When there are several Point with the same data, the memory usage should be lower than having another instance of the structure for each of them.
Performance should not be much worse than working directly with primitive data in the structure.
The string must be kept.
The "Point" interface should remain the same (for example, classes that use "Point" should not be changed).

Is there a way to improve our approach to these criteria? Or can someone suggest a different approach that we can try to do?

+6

c # design-patterns

Steven Nov 08 '13 at 9:18

source share

1 answer

Gusdor · Answer 1 · 2013-11-08T09:29:24+0000

Instead of reworking the whole data structure and programming model, my solution to the performance and memory problem is caching, pre-fetching and, most importantly, cull data when it’s not needed.

Think of it this way. On the graph, you cannot display several million points at once, because you are running out of pixels (you must occlusion - select these points). Similarly, there is not enough vertical space in the table on the screen (you need to truncate the data set). Consider streaming data from your source file as needed. If your original data structure is not suitable for dynamic searches, consider an intermediate, temporary file format. This is one way .Net JITer is so fast!

Effective Fly Implementation

More articles: