My first problem is how to remove all non-numeric parts from numbers, such as "100M" and "0N #", which should be 100 and 0 respectively.
import re df = pd.read_csv(yourfile, header=None) df.columns = ['ID'] + list(df.columns)[1:] df = df.stack().apply(lambda v: re.sub('[^0-9]','', v) if isinstance(v, str) else v).astype(float).unstack() df.groupby('ID').agg(['std', 'mean'])
Here .stack() converts the dataframe to a series, .apply() calls a lambda for each value, re.sub() removes any non-numeric characters, .astype() converts to a numeric value, and unstack() converts the series back to dataframe. This works equally well for both integers and floating point numbers.
Given a specific column, I would like to split the rows by identifier, and then output the mean and standard deviation for each identifier.
# for all columns df.groupby('ID').agg(['std', 'mean'])

The data used in the example is used here:
from StringIO import StringIO s=""" 1,98.4,100M,55M,65M,75M,100M,75M,65M,100M,98M,100M,100M,92M,0#,0N#, 1,98.4,100M,55M,65M,75M,100M,75M,65M,100M,98M,100M,100M,92M,0#,0N#, 2,98.4,100M,55M,65M,75M,100M,75M,65M,100M,98M,100M,100M,92M,0#,0N#, 2,98.4,100M,55M,65M,75M,100M,75M,65M,100M,98M,100M,100M,92M,0#,0N#, """ yourfile = StringIO(s)