Creating multiple function columns in Tensorflow

I'm starting work on a Tensorflow project, and I'm in the middle of defining and creating function columns. However, I have hundreds and hundreds of functions - this is a fairly extensive dataset. Even after pre-processing and cleaning, I have many columns.

The traditional way to create feature_columndetermined Tensorflow textbook and even the qaru.site/questions/1687825 / ... . You essentially declare and initialize a Tensorflow object for each function column:

gender = tf.feature_column.categorical_column_with_vocabulary_list(
    "gender", ["Female", "Male"])

This works well and well if there are only a few columns in your dataset, but in my case, of course, I do not want to have hundreds of lines of code initializing different objects feature_column.

What is the best way to solve this problem? I notice that in the tutorial all the columns are assembled as a list:

base_columns = [
    gender, native_country, education, occupation, workclass, relationship,
    age_buckets,
]

What ultimately passes into your assessment:

m = tf.estimator.LinearClassifier(
    model_dir=model_dir, feature_columns=base_columns)

So, the ideal way to handle creation feature_columnfor hundreds of columns is to add them directly to the list? Something like that?

my_columns = []

for col in df.columns:
    if is_string_dtype(df[col]): #is_string_dtype is pandas function
        my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
            hash_bucket_size= len(df[col].unique())))

    elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
        my_column.append(tf.feature_column.numeric_column(col))

Is this the best way to create these columns? Or am I missing some of the Tensorflow features that let me get around this step?

+4
source share
2 answers

What you have makes sense to me. :) copying from your own code:

my_columns = []

for col in df.columns:
  if is_string_dtype(df[col]): #is_string_dtype is pandas function
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
    my_columns.append(tf.feature_column.numeric_column(col))
+2
source

I used my own answer. Just a little edited (should be my_columnsinstead of my_columnin a loop for) and publish it the way it worked for me.

import pandas.api.types as ptypes

my_columns = []

for col in df.columns:
  if ptypes.is_string_dtype(df[col]): #is_string_dtype is pandas function
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif ptypes.is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
    my_columns.append(tf.feature_column.numeric_column(col))
+1
source

Source: https://habr.com/ru/post/1687823/


All Articles