This is my first post, so I hope I do not wonder, and I understand. Basically, this is a two-part question. I need to set up a code that first checks to see if column A = "VALID". If so, I need to extract the substring from column B and place it in a new column, designated as "C". If the condition is false, I would add "NA". Look at the second table for my desired result.
| A | B | |-------------|-----------------------------------| | VALID |asdfafX'XextractthisY'Yeaaadf | | INVALID |secondrowX'XsubtextY'Yelakj | | VALID |secondrowX'XextractthistooY'Yelakj |
| A | B | C | |-------------|-------------------------------------|-----------------| | VALID |"asdfafX'XextractthisY'Yeaaadf" | extractthis | | INVALID |"secondrowX'XsubtextY'Yelakj" | NA | | VALID |"secondrowX'XextractthistooY'Yelakj" | extractthistoo |
A few notes:
- A substring always starts after the phrase โX'Xโ and ends right before โY'Yโ.
- The substring will have different lengths from cell to cell.
I know that the following code is incorrect, but I wanted to show you how I tried to solve this problem:
import pandas as pd if df[A] == "VALID": df[C] = df[B]df.str[start:finish] else: df[C].isna()
I apologize for the errors in this base code, as I am new to python in general and still rely on the IDE and trial and error to guide me. Any help you can provide is appreciated.
source share