How do you use the OpenXML API to read a table from an Excel spreadsheet?

I read a bunch of things on the Internet about how to get cell data using the OpenXML API. But there really is not so much that is especially simple. Most likely, we are talking about writing on SpreadsheetML, and not reading ... but even this helps a little. I have a table in which there is a table. I know what the name of the table is, and I can find out which sheet it is on, and which columns are in the table. But I can't figure out how to get a collection of rows back that contain data in a table.

I have this to upload a document and get a pen in a book:

SpreadsheetDocument document = SpreadsheetDocument.Open("file.xlsx", false); WorkbookPart workbook = document.WorkbookPart; 

I have this to find a table / sheet:

 Table table = null; foreach (Sheet sheet in workbook.Workbook.GetFirstChild<Sheets>()) { WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheet.Id); foreach (TableDefinitionPart tableDefinitionPart in worksheetPart.TableDefinitionParts) { if (tableDefinitionPart.Table.DisplayName == this._tableName) { table = tableDefinitionPart.Table; break; } } } 

And I can iterate through the columns in the table by going through the table. Column tables.

+4
source share
2 answers

To read an Excel 2007/2010 table with the OpenXML API is really enough. Somehow even easier than using OleDB, as we always did it as a quick and dirty solution. In addition, it is not easy, but detailed , I think that putting all the code here is not useful if you need to comment and explain it, so I will write only a summary and I will link a good article. Read this article on MSDN , which explains how to read XLSX documents very easily.

Summarize:

  • Open SpreadsheetDocument with SpreadsheetDocument.Open .
  • Get the Sheet you need with the LINQ query from the WorkbookPart document.
  • Get (finally!) WorksheetPart (the object you need) using the Sheet identifier.

In the code, removing comments and handling errors:

 using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileName, false)) { Sheet sheet = document.WorkbookPart.Workbook .Descendants<Sheet>() .Where(s => s.Name == sheetName) .FirstOrDefault(); WorksheetPart sheetPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id)); } 

Now (but inside use!) You just need to read the cell value:

 Cell cell = sheetPart.Worksheet.Descendants<Cell>(). Where(c => c.CellReference == addressName).FirstOrDefault(); 

If you need to list the rows (and there are many), you should first get a reference to the SheetData object:

 SheetData sheetData = sheetPart.Worksheet.Elements<SheetData>().First(); 

Now you can query all rows and cells:

 foreach (Row row in sheetData.Elements<Row>()) { foreach (Cell cell in row.Elements<Cell>()) { string text = cell.CellValue.Text; // Do something with the cell value } } 

To simply list a regular spreadsheet, you can use the Descendants<Row>() of the WorksheetPart object.

If you need more resources about OpenXML, check out OpenXML Developer , it contains many good lessons.

+3
source

There are probably many better ways to code this, but I hit it together because I need it, so hopefully this helps some others.

 using DocumentFormat.OpenXml.Spreadsheet; using DocumentFormat.OpenXml.Packaging; private static DataTable genericExcelTable(FileInfo fileName) { DataTable dataTable = new DataTable(); try { using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName.FullName, false)) { Workbook wkb = doc.WorkbookPart.Workbook; Sheet wks = wkb.Descendants<Sheet>().FirstOrDefault(); SharedStringTable sst = wkb.WorkbookPart.SharedStringTablePart.SharedStringTable; List<SharedStringItem> allSSI = sst.Descendants<SharedStringItem>().ToList<SharedStringItem>(); WorksheetPart wksp = (WorksheetPart)doc.WorkbookPart.GetPartById(wks.Id); foreach (TableDefinitionPart tdp in wksp.TableDefinitionParts) { QueryTablePart qtp = tdp.QueryTableParts.FirstOrDefault<QueryTablePart>(); Table excelTable = tdp.Table; int colcounter = 0; foreach (TableColumn col in excelTable.TableColumns) { DataColumn dcol = dataTable.Columns.Add(col.Name); dcol.SetOrdinal(colcounter); colcounter++; } SheetData data = wksp.Worksheet.Elements<SheetData>().First(); foreach (DocumentFormat.OpenXml.Spreadsheet.Row row in data) { if (isInTable(row.Descendants<Cell>().FirstOrDefault(), excelTable.Reference, true)) { int cellcount = 0; DataRow dataRow = dataTable.NewRow(); foreach (Cell cell in row.Elements<Cell>()) { if (cell.DataType != null && cell.DataType.InnerText == "s") { dataRow[cellcount] = allSSI[int.Parse(cell.CellValue.InnerText)].InnerText; } else { dataRow[cellcount] = cell.CellValue.Text; } cellcount++; } dataTable.Rows.Add(dataRow); } } } } //do whatever you want with the DataTable return dataTable; } catch (Exception ex) { //handle an error return dataTable; } } private static Tuple<int, int> returnCellReference(string cellRef) { int startIndex = cellRef.IndexOfAny("0123456789".ToCharArray()); string column = cellRef.Substring(0, startIndex); int row = Int32.Parse(cellRef.Substring(startIndex)); return new Tuple<int,int>(TextToNumber(column), row); } private static int TextToNumber(string text) { return text .Select(c => c - 'A' + 1) .Aggregate((sum, next) => sum * 26 + next); } private static bool isInTable(Cell testCell, string tableRef, bool headerRow){ Tuple<int, int> cellRef = returnCellReference(testCell.CellReference.ToString()); if (tableRef.Contains(":")) { int header = 0; if (headerRow) { header = 1; } string[] tableExtremes = tableRef.Split(':'); Tuple<int, int> startCell = returnCellReference(tableExtremes[0]); Tuple<int, int> endCell = returnCellReference(tableExtremes[1]); if (cellRef.Item1 >= startCell.Item1 && cellRef.Item1 <= endCell.Item1 && cellRef.Item2 >= startCell.Item2 + header && cellRef.Item2 <= endCell.Item2) { return true; } else { return false; } } else if (cellRef.Equals(returnCellReference(tableRef))) { return true; } else { return false; } } 
0
source

Source: https://habr.com/ru/post/1402560/


All Articles