Parsing a table using Microsoft.Office.Interop.Word, get only text from the first column?

I am working on writing a program that will analyze text data from a Microsoft Word 2010 document. In particular, I want to get text from each cell in the first column of each table in the document.

For reference, the document looks like this: enter image description here

I only need the text from the cells in the first column on each page. I am going to add this text to an internal datatable.

My code so far looks like this:

private void button1_Click(object sender, EventArgs e) { // Create an instance of the Open File Dialog Box var openFileDialog1 = new OpenFileDialog(); // Set filter options and filter index openFileDialog1.Filter = "Word Documents (.docx)|*.docx|All files (*.*)|*.*"; openFileDialog1.FilterIndex = 1; openFileDialog1.Multiselect = false; // Call the ShowDialog method to show the dialog box. openFileDialog1.ShowDialog(); txtDocument.Text = openFileDialog1.FileName; var word = new Microsoft.Office.Interop.Word.Application(); object miss = System.Reflection.Missing.Value; object path = openFileDialog1.FileName; object readOnly = true; var docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss); // Datatable to store text from Word doc var dt = new System.Data.DataTable(); dt.Columns.Add("Text"); // Loop through each table in the document, // grab only text from cells in the first column // in each table. foreach (Table tb in docs.Tables) { // insert code here to get text from cells in first column // and insert into datatable. } ((_Document)docs).Close(); ((_Application)word).Quit(); } 

I am stuck in the part where I grab the text from each cell and add it to my datatable. Can someone suggest me some pointers? I would appreciate it.

Thanks!

+4
source share
1 answer

I don’t know how you want to save it in your database, but to read the text, I think you could loop the rows and select the first column in each of them:

 foreach (Table tb in docs.Tables) { for (int row = 1; row <= tb.Rows.Count; row++) { var cell = tb.Cell(row, 1); var text = cell.Range.Text; // text now contains the content of the cell. } } 
+12
source

Source: https://habr.com/ru/post/1492780/


All Articles