C # OPEN XML: empty cells are skipped when receiving data from EXCEL in DATATABLE

Task

Import data from excel to DataTable

Problem

A cell that does not contain any data is skipped, and the very next cell with data in the row is used as the value of the empty colony. for instance

A1 is empty A2 has the value Tom , then when importing data A1 get the value A2 and A2 remains empty

To make this very clear, I provide a few screenshots below

This is excel data

enter image description here

This is a DataTable after importing data from excel enter image description here

code

 public class ImportExcelOpenXml { public static DataTable Fill_dataTable(string fileName) { DataTable dt = new DataTable(); using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false)) { WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart; IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>(); string relationshipId = sheets.First().Id.Value; WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId); Worksheet workSheet = worksheetPart.Worksheet; SheetData sheetData = workSheet.GetFirstChild<SheetData>(); IEnumerable<Row> rows = sheetData.Descendants<Row>(); foreach (Cell cell in rows.ElementAt(0)) { dt.Columns.Add(GetCellValue(spreadSheetDocument, cell)); } foreach (Row row in rows) //this will also include your header row... { DataRow tempRow = dt.NewRow(); for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); } dt.Rows.Add(tempRow); } } dt.Rows.RemoveAt(0); //...so i'm taking it out here. return dt; } public static string GetCellValue(SpreadsheetDocument document, Cell cell) { SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart; string value = cell.CellValue.InnerXml; if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString) { return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; } else { return value; } } } 

My thoughts

I think there is some problem with

public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;

In case I want the number of columns used with Descendants

 IEnumerable<Row> rows = sheetData.Descendants<<Row>(); int colCnt = rows.ElementAt(0).Count(); 

OR

If I get a row count using Descendants

 IEnumerable<Row> rows = sheetData.Descendants<<Row>(); int rowCnt = rows.Count();` 

In both cases, Descendants skips empty cells

Is there an alternative to Descendants .

Your suggestions are highly appreciated

PS: I also thought about getting cell values ​​using column names such as A1, A2 , but for this I will need to get the exact number of columns and rows that are not possible using the Descendants function.

+9
source share
3 answers

If there were data in all cells in a row, then everything works well. At the moment when you have at least one empty cell in a row, things are going badly.

Why is this happening in the first place ?

This is because in the code below:

 row.Descendants<Cell>().Count() 

Count() is the number of non-empty filled cells (not all columns). So, when you pass row.Descendants<Cell>().ElementAt(i) as an argument to the GetCellValue method:

 GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); 

Then it will find the contents of the next non-empty filled cell (not necessarily what is in the index of this column, i ), for example, if the first column is empty and we call ElementAt(1) , instead it returns the value in the second column. and all logic is confused.

Decision. We need to deal with the appearance of empty cells . In fact, we need to find out the initial index of the cell column in case there were empty cells in front of it. So you need to replace your loop code as below:

 for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); } 

with

 for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { Cell cell = row.Descendants<Cell>().ElementAt(i); int actualCellIndex = CellReferenceToIndex(cell); tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell); } 

and add the method below to your code, which is used in the above fragment of the modified code to get the source / correct column index of any cell:

 private static int CellReferenceToIndex(Cell cell) { int index = 0; string reference = cell.CellReference.ToString().ToUpper(); foreach (char ch in reference) { if (Char.IsLetter(ch)) { int value = (int)ch - (int)'A'; index = (index == 0) ? value : ((index + 1) * 26) + value; } else return index; } return index; } 
+10
source
 public void Read2007Xlsx() { try { DataTable dt = new DataTable(); using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(@"D:\File.xlsx", false)) { WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart; IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>(); string relationshipId = sheets.First().Id.Value; WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId); Worksheet workSheet = worksheetPart.Worksheet; SheetData sheetData = workSheet.GetFirstChild<SheetData>(); IEnumerable<Row> rows = sheetData.Descendants<Row>(); foreach (Cell cell in rows.ElementAt(0)) { dt.Columns.Add(GetCellValue(spreadSheetDocument, cell)); } foreach (Row row in rows) //this will also include your header row... { DataRow tempRow = dt.NewRow(); int columnIndex = 0; foreach (Cell cell in row.Descendants<Cell>()) { // Gets the column index of the cell with data int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference)); cellColumnIndex--; //zero based index if (columnIndex < cellColumnIndex) { do { tempRow[columnIndex] = ""; //Insert blank data here; columnIndex++; } while (columnIndex < cellColumnIndex); } tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell); columnIndex++; } dt.Rows.Add(tempRow); } } dt.Rows.RemoveAt(0); //...so i'm taking it out here. } catch (Exception ex) { } } /// <summary> /// Given a cell name, parses the specified cell to get the column name. /// </summary> /// <param name="cellReference">Address of the cell (ie. B2)</param> /// <returns>Column Name (ie. B)</returns> public static string GetColumnName(string cellReference) { // Create a regular expression to match the column name portion of the cell name. Regex regex = new Regex("[A-Za-z]+"); Match match = regex.Match(cellReference); return match.Value; } /// <summary> /// Given just the column name (no row index), it will return the zero based column index. /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). /// A length of three can be implemented when needed. /// </summary> /// <param name="columnName">Column Name (ie. A or AB)</param> /// <returns>Zero based index if the conversion was successful; otherwise null</returns> public static int? GetColumnIndexFromName(string columnName) { //return columnIndex; string name = columnName; int number = 0; int pow = 1; for (int i = name.Length - 1; i >= 0; i--) { number += (name[i] - 'A' + 1) * pow; pow *= 26; } return number; } public static string GetCellValue(SpreadsheetDocument document, Cell cell) { SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart; if (cell.CellValue ==null) { return ""; } string value = cell.CellValue.InnerXml; if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString) { return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; } else { return value; } } 
+5
source

Try this code, I made small changes, and it worked for me.

  public static DataTable Fill_dataTable(string filePath) { DataTable dt = new DataTable(); using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, false)) { Sheet sheet = doc.WorkbookPart.Workbook.Sheets.GetFirstChild<Sheet>(); Worksheet worksheet = doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart.Worksheet; IEnumerable<Row> rows = worksheet.GetFirstChild<SheetData>().Descendants<Row>(); DataTable dt = new DataTable(); List<string> columnRef = new List<string>(); foreach (Row row in rows) { if (row.RowIndex != null) { if (row.RowIndex.Value == 1) { foreach (Cell cell in row.Descendants<Cell>()) { dt.Columns.Add(GetValue(doc, cell)); columnRef.Add(cell.CellReference.ToString().Substring(0, cell.CellReference.ToString().Length - 1)); } } else { dt.Rows.Add(); int i = 0; foreach (Cell cell in row.Descendants<Cell>()) { while (columnRef(i) + dt.Rows.Count + 1 != cell.CellReference) { dt.Rows(dt.Rows.Count - 1)(i) = ""; i += 1; } dt.Rows(dt.Rows.Count - 1)(i) = GetValue(doc, cell); i += 1; } } } } } return dt; } 
0
source

Source: https://habr.com/ru/post/1245374/


All Articles