In accordance with RFC 4180, fields must be parsed from left to right in order to correctly interpret the double quote. In some contexts, "" is a hidden double quote (if inside the field with quotes), otherwise it is either an empty string or two double quotes (if they are inside the value of a non-empty field).
For example, consider a file with 4 entries (1 column):
"field""value" CRLF "" CRLF field""value CRLF "field value" extra CRLF
"field""value" - read as field"value"" - should be read as an empty stringfield""value - read as field""value"field value" extra - can be read as field value extra or you can reject it
Record 4 is indeed an invalid field, so you can accept or reject it.
When you start reading a field, you need to check whether the first character is read double or not. If the first character is a double quotation mark, the value of the field is indicated, and you need to read until you find the hidden closing double quotation mark. In this case, you can ignore new lines and comma characters, since the field is quoted - it ends only when the double quote is closed.
If the first character is not a double quote, then all double quotes in the field value should be treated as literal double qoutes. In this case, you reach the end of the field when you encounter a comma or a new line character.
Based on this, I would recommend always specifying all fields when writing records and writing the correct parser to analyze records when reading data. This way you can store any data in your CSV files (even multi-line text with embedded quotes), and your format will be transparent. When you read a CSV file, I would not be able to process all the files that cannot be parsed correctly - if it is a database, you can expect users to not manually record notes if they do not know what they are doing.
source share