How to iterate over two columns in python?

im trying to iterate over two columns in a csv file using python ?, I heard that you need to import pandas for this, but I'm just struggling with the coding part.

import csv as csv
import numpy as np
import pandas as pd

csv_file_object = csv.reader(open('train.csv', 'rb'))  # Load in the csv file
header = csv_file_object.next()                   # Skip the fist line as it is a header
data=[]                                     # Create a variable to hold the data

for row in csv_file_object:                      # Skip through each row in the csv file,
    data.append(row[0:])                        # adding each row to the data variable
data = np.array(data)   



def number_of_female_in_class_3(data):
    for row in data.iterow:
        if row[2] == 'female' and row[4] == '3':
            sum += 1

The problem is with the number_of_female_in_class_3 function . I want to go through two columns, I want to go through column 2 to check if the row contains the line “woman” and goes through columns 4 and checks if the status is “3”. If this is true, I want to increase 1 to sum .

I was wondering if anyone can post simple code on how to do this?

here is the train.csv file trying to extract.

**PassengerID** | **Survived** | **Pclass**   | **Name**  |  **Sex**   |
          1     |          0   |         3    |  mary     |  Female    |
          2     |          1   |         2    |  james    |  Male      |
          3     |          1   |         3    |  Tanya    |  Female    |

thanks

+4
2

, pandas .

CSV:

PassengerID,Survived,Pclass,Name,Sex
1,0,3,mary,female
2,1,2,james,male
3,1,3,tanya,female

CSV , ( CSV), (. ). pandas, :

>>> import pandas as pd
>>> df = pd.DataFrame.from_csv('data.csv')
>>> result = df[(df.Sex=='female') & (df.Survived==False)]

DataFrame:

>>> result
             Survived  Pclass  Name     Sex
PassengerID                                
1                   0       3  mary  female

len(result), , .


CSV

CSV, df :

# Load using a different delimiter.
df = pd.DataFrame.from_csv('data.csv', sep="|")

# Rename the index.
df.index.names = ['PassID']

# Rename the columns, using X for the bogus one.
df.columns = ['Survived', 'Pclass', 'Name', 'Sex', 'X']

# Remove the 'extra' column.
del df['X']
+1

, , :

import csv

def number_of_female_in_class_3(data):
    # initialize sum variable
    sum = 0
    for row in data:
        if row[4] == 'Female' and row[2] == '3':
            # match
            sum += 1
    # return the result
    return sum

# Load in the csv file
csv_file_object = csv.reader(open('train.csv', 'rb'), delimiter='|')
# skip the header
header = csv_file_object.next()
data = []

for row in csv_file_object:
    # add each row of data to the data list, stripping excess whitespace
    data.append(map(str.strip, row))

# print the result
print number_of_female_in_class_3(data)

:

, F, -, ( 5 3) 0, . numpy pandas , , (map(str.strip, row)), delimiter='|' csv.reader, . , return sum .

0

Source: https://habr.com/ru/post/1617633/


All Articles