Duplication type:
Check this column only (default)
Check other columns only
Check all columns
Use Last Value:
True - retain the last duplicate value
False - retain the first of the duplicates (default)
This rule is to add a new column to the data framework that contains the same as the original column for any unique columns, and is zero for any repeating columns.
the base code is df.loc [df.duplicated (), get_unique_column_name (df, "clean")] = df [get_column_name (df, column)] with parameters for duplicated () specified based on the duplication type
See the link for this function above: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html
You must specify the columns in the subset parameter based on the duplication_type parameter
You must specify use_last_value based on use_last_value above
This is my file.
Jason Miller 42 4 25
Tina Ali 36 31 57
Jake Milner 24 2 62
Jason Miller 42 4 25
Jake Milner 24 2 62
Amy Cooze 73 3 70
Jason Miller 42 4 25
Jason Miller 42 4 25
Jake Milner 24 2 62
Jake Miller 42 4 25
, pandas.in . 2 .
Jason Miller 42 4 25
Jake Ali 36 31 57
Jake Milner 24 2 62
Jason Miller 4 25
Jake Milner 2 62
Jake Cooze 73 3 70
Jason Miller 4 25
Jason Miller 4 25
Jake Milner 2 62
Jake Miller 4 25
, .