This is a complete revision of my previous answer. It turns out that in my previous attempts I overlooked a much simpler method based on a combination of packed arrays and sparse arrays, which is much faster and more efficient in terms of memory than all previous methods (at least in the range of sample sizes, where I tested it), although minimally SubValues original SubValues approach. Since the question was asked about the most efficient method, I will remove others from the answer (given that they are quite complex and take up a lot of space. Those who would like to see them can check past versions of this answer).
Original SubValues
Let's start by introducing the function to create test configurations for us. There he is:
Clear[generateConfigurations]; generateConfigurations[maxIndex_Integer, maxConfX_Integer, maxConfY_Integer, nconfs_Integer] := Transpose[{ RandomInteger[{1, maxIndex}, nconfs], Transpose[{ RandomInteger[{1, maxConfX}, nconfs], RandomInteger[{1, maxConfY}, nconfs] }]}];
We can create a small sample to illustrate:
In[3]:= sample = generateConfigurations[2,2,2,10] Out[3]= {{2,{2,1}},{2,{1,1}},{1,{2,1}},{1,{1,2}},{1,{1,2}}, {1,{2,1}},{2,{1,2}},{2,{2,2}},{1,{2,2}},{1,{2,1}}}
Here we have only 2 indexes and configurations, where both the numbers "x" and "y" vary from 1 to 2 - only 10 such configurations.
The following function will help us mimic the accumulation of frequencies for configurations as we increase the counters based on SubValues for repeating ones:
Clear[testAccumulate]; testAccumulate[ff_Symbol, data_] := Module[{}, ClearAll[ff]; ff[_][_] = 0; Do[ doSomeStuff; ff[#1][#2]++ & @@ elem; doSomeMoreStaff; , {elem, data}]];
The symbols doSomeStuff and doSomeMoreStaff are shown here to indicate some code that may exclude or follow the count code. The data parameter should be a list of the form created by generateConfigurations . For instance:
In[6]:= testAccumulate[ff,sample]; SubValues[ff] Out[7]= {HoldPattern[ff[1][{1,2}]]:>2,HoldPattern[ff[1][{2,1}]]:>3, HoldPattern[ff[1][{2,2}]]:>1,HoldPattern[ff[2][{1,1}]]:>1, HoldPattern[ff[2][{1,2}]]:>1,HoldPattern[ff[2][{2,1}]]:>1, HoldPattern[ff[2][{2,2}]]:>1,HoldPattern[ff[_][_]]:>0}
The following function extracts the resulting data (indices, configurations, and their frequencies) from the SubValues list:
Clear[getResultingData]; getResultingData[f_Symbol] := Transpose[{#[[All, 1, 1, 0, 1]], #[[All, 1, 1, 1]], #[[All, 2]]}] &@ Most@SubValues [f, Sort -> False];
For instance:
In[10]:= result = getResultingData[ff] Out[10]= {{2,{2,1},1},{2,{1,1},1},{1,{2,1},3},{1,{1,2},2},{2,{1,2},1}, {2,{2,2},1},{1,{2,2},1}}
To end the data processing cycle, there is a direct function to extract data for a fixed index based on Select :
Clear[getResultsForFixedIndex]; getResultsForFixedIndex[data_, index_] := If[# === {}, {}, Transpose[#]] &[ Select[data, First@ # == index &][[All, {2, 3}]]];
In our test case
In[13]:= getResultsForFixedIndex[result,1] Out[13]= {{{2,1},{1,2},{2,2}},{3,2,1}}
This seems to be close to what @zorank tried in code.
Faster solution based on arrays and sparse arrays
As @zorank noted, this becomes slow for a larger sample with more indexes and configurations. Now we will create a large sample to illustrate this ( note! This requires about 4-5 GB of RAM, so you may need to reduce the number of configurations if it exceeds the available RAM ):
In[14]:= largeSample = generateConfigurations[20,500,500,5000000]; testAccumulate[ff,largeSample];
Now we will extract the full data from SubValues of ff :
In[16]:= (largeres = getResultingData[ff]);
This takes some time, but you only need to do this once. But when we start retrieving data for a fixed index, we see that it is rather slow:
In[24]:= getResultsForFixedIndex[largeres,10]
The main idea that we will use here to speed it up is to pack separate lists inside largeres , for indices, combinations and frequencies. Although the complete list cannot be packaged, these parts individually can:
In[18]:= Timing[ subIndicesPacked = Developer`ToPackedArray[largeres[[All,1]]]; subCombsPacked = Developer`ToPackedArray[largeres[[All,2]]]; subFreqsPacked = Developer`ToPackedArray[largeres[[All,3]]]; ] Out[18]= {1.672,Null}
It also takes some time, but it is a one-time operation again.
The following functions will be used to more efficiently retrieve results for a fixed index:
Clear[extractPositionFromSparseArray]; extractPositionFromSparseArray[HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]] Clear[getCombinationsAndFrequenciesForIndex]; getCombinationsAndFrequenciesForIndex[packedIndices_, packedCombs_, packedFreqs_, index_Integer] := With[{positions = extractPositionFromSparseArray[ SparseArray[1 - Unitize[packedIndices - index]]]}, {Extract[packedCombs, positions],Extract[packedFreqs, positions]}];
Now we have:
In[25]:= getCombinationsAndFrequenciesForIndex[subIndicesPacked,subCombsPacked,subFreqsPacked,10]
We get a 30x acceleration wrt naive Select approach.
Some complexity notes
Please note that the second solution is faster because it uses optimized data structures, but its complexity is the same as that of Select - one based, which is linear along the length of the general list of unique combinations for all indices. Therefore, theoretically discussed solutions based on a nested hash table, etc., can be asymptotically better. The problem is that in practice, we are likely to run into memory limitations long before that. For a sample of 10 million configurations, the above code was still 2-3 times faster than the fastest solution I posted earlier.
EDIT
The following modification:
Clear[getCombinationsAndFrequenciesForIndex]; getCombinationsAndFrequenciesForIndex[packedIndices_, packedCombs_, packedFreqs_, index_Integer] := With[{positions = extractPositionFromSparseArray[ SparseArray[Unitize[packedIndices - index], Automatic, 1]]}, {Extract[packedCombs, positions], Extract[packedFreqs, positions]}];
makes code twice as fast. Moreover, for more sparse indices (for example, by invoking a sampling function with parameters such as generateConfigurations[2000, 500, 500, 5000000] ), the acceleration with respect to the Select function is about 100 times.