I'm starting work on a Tensorflow project, and I'm in the middle of defining and creating function columns. However, I have hundreds and hundreds of functions - this is a fairly extensive dataset. Even after pre-processing and cleaning, I have many columns.
The traditional way to create feature_column
determined Tensorflow textbook and even the qaru.site/questions/1687825 / ... . You essentially declare and initialize a Tensorflow object for each function column:
gender = tf.feature_column.categorical_column_with_vocabulary_list(
"gender", ["Female", "Male"])
This works well and well if there are only a few columns in your dataset, but in my case, of course, I do not want to have hundreds of lines of code initializing different objects feature_column
.
What is the best way to solve this problem? I notice that in the tutorial all the columns are assembled as a list:
base_columns = [
gender, native_country, education, occupation, workclass, relationship,
age_buckets,
]
What ultimately passes into your assessment:
m = tf.estimator.LinearClassifier(
model_dir=model_dir, feature_columns=base_columns)
So, the ideal way to handle creation feature_column
for hundreds of columns is to add them directly to the list? Something like that?
my_columns = []
for col in df.columns:
if is_string_dtype(df[col]):
my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col,
hash_bucket_size= len(df[col].unique())))
elif is_numeric_dtype(df[col]):
my_column.append(tf.feature_column.numeric_column(col))
Is this the best way to create these columns? Or am I missing some of the Tensorflow features that let me get around this step?