Codegolf: convert CSV to HTML table with least code in C #

I am adding a function to my personal lib toolkit to easily convert CSV to HTML table.

I would like to make the smallest possible piece of code in C # , and it should be able to process CSV files in excess of ~ 500 MB.

So far my two bidders

  • splitting csv into arrays into delimiters and building HTML output

  • search-replace separators with table th tr td tags

Suppose file / read / disk operations are already being processed ... that is, I am passing a string containing the contents of the specified CSV to this function. The result will consist of simple simple HTML markup without styles, and yes, the data can have wandering commas and breaks in it.

update: some people asked. 100% of the CSV I am facing comes directly from excel if this helps.

Example line:

  a1, b1, c1 \ r \ n
 a2, b2, c2 \ r \ n
0
source share
3 answers

Read all lines in memory

var lines =File.ReadAllLines(args[0]); using (var outfs = File.AppendText(args[1])) { outfs.Write("<html><body><table>"); foreach (var line in lines) outfs.Write("<tr><td>" + string.Join("</td><td>", line.Split(',')) + "</td></tr>"); outfs.Write("</table></body></html>"); } 

or read one line at a time

  using (var inFs = File.OpenText(args[0])) using (var outfs = File.AppendText(args[1])) { outfs.Write("<html><body><table>"); while (!inFs.EndOfStream ) outfs.Write("<tr><td>" + string.Join("</td><td>", inFs.ReadLine().Split(',')) + "</td></tr>"); outfs.Write("</table></body></html>"); } 

... @Jimmy ... I created an advanced version using LINQ. Here's the highlight ... (lazy score for reading a line)

  using (var lp = args[0].Load()) lp.Select(l => "<tr><td>" + string.Join("</td><td>", l.Split(',')) + "</td></tr>") .Write("<html><body><table>", "</table></body></html>", args[1]); 
+5
source

probably not much shorter than you can get than this, but just remember that any real solution would handle quotes, commas inside quotes and conversions to html objects.

 return "<table><tr><td>"+s .Replace("\n","</td></tr><tr><td>") .Replace(",","</td><td>")+"</td></tr></table>"; 

EDIT: here's (mostly untested) adding htmlencode and quotes. First I htmlencode, then all commas become '<' (which do not collide, because existing ones are already encoded.

 bool q=false; return "<table><tr><td>" + new string(HttpUtility.HtmlEncode(s) .Select(c=>c=='"'?(q=!q)?c:c:(c==','&&!q)?'<':c).ToArray()) .Replace("<", "</td><td>") .Replace("\n", "</td></tr><tr><td>") + "</td></tr></table>"; 
+2
source

Here's an interesting version using lambda expressions. It's not as small as replacing commas with "</td><td>" , but it has its own special charm:

 var r = new StringBuilder("<table>"); s.Split('\n').ToList().ForEach(t => r.Append("<tr>").Append(t.Split(',').Select(u => "<td>" + u + "</td>")).Append("</tr>")); return r.Append("</table>").ToString(); 

If I were right for the production, I would use a state machine to track nested quotes, new lines, and commas, since excel can put new lines in the middle of the column. IIRC, you can also specify a different delimiter completely.

+1
source

Source: https://habr.com/ru/post/1286347/


All Articles