How can I code functions with more than one value per column? Need MultiDictVectorizer?

Question

How can I code functions with more than one value per column? Need MultiDictVectorizer?

I am embedding some features in sklearn and I am having a problem. DictVectorizer works well if your data can be encoded in one voice button for each item. What if your items can have two or more values of the same column? For example, DictVectorizer works great with an element like this:

{'a': 'b', 'b': 'c'}

But what about something similar, with more than one value per column?

{'a': ['b','c'], 'b': 'd'}

The one-hot-encoding strategy can be applied, you just need two columns ... a = b and a = c. As far as I can tell, such a vectorizer does not exist! What needs to be done in this situation? Do I need to create my own MultiDictVectorizer?

I wrote about this on my blog here before posting.

+4

python scikit-learn feature-extraction dictvectorizer

rjurney Feb 15 '16 at 23:46

source share

2 answers

Guiem Bosch · Answer 1 · 2016-02-16T19:06:53+0000

In this situation, there are at least two quick possible solutions:

Create a new value that represents the possibility of having two aggregated values
{'a': 'bc', 'b': 'd'} or give it another name, i.e. 'bc'-->'e'
Repeat the pattern each time, taking one of the values
{'a': 'b', 'b': 'd'} and {'a': 'c', 'b': 'd'}

But, of course, it depends on the context of your problem (case 2: is it right to “duplicate” the sample with various manifestations? Case 1: is the conceptually different new value of the function acceptable?). And I don’t even know if this multi-valued function of the situation corresponds N/A, for example.

github, , , , .

rjurney · Answer 2 · 2016-02-16T18:47:53+0000

DictVectorizer , . , sklearn. , DictVectorizer MultiDictVectorizer .

Pull Github

Github

How can I code functions with more than one value per column? Need MultiDictVectorizer?

More articles: