I have a large text document filled with random words, URLs, email addresses, etc. Example: "word 2014 john@doe.com http://www.example.com/ http://example.com/image.gif ", but it may look different, there may be line breaks, several spaces, tabs etc. And the data can very quickly become huge (this is a type of bookmarking service, so the data comes all the time in the form of images, text and hyperlinks).
Another example of content in a text document (the one I use for testing):
http://movpod.in/images3/MovPod-logo.png
https://dt8kf6553cww8.cloudfront.net/static/images/developers/chooser-drawing-vfln1ftk6.png
http://xregexp.com/assets/regex_cookbook.gif
asd asd ad feaf
apa
http
I want to wrap all these lines in tags and be able to highlight images, hyperlinks, emails and lines. I tried different ways, but not sure which is the best, and there is also RegExp, which I do not quite understand.
The end result should be:
<span>word</span>
<span>2014</span>
<a class="mail" href="mailto:john@doe">john@doe.com</a>
<a class="url" href="http://www.example.com/">http:
<a class="img" href="http://example.com/image.gif">http:
Match . However, this approach does not preserve the order of the text, but it works.
arr = data.split("\n");
for (i = 0; i < arr.length; i++)
{
arr2 = arr[i].split(' ');
for (j = 0; j < arr2.length; j++)
{
if (arr2[j].match(/(.gif|.png|.jpg|.jpeg)/))
{
ext = arr2[j].substr(-4);
ext = ext.replace(".","");
imgs += '<a class="img '+ext+'" href="'+arr2[j]+'">'+arr2[j]+'</a>';
}
else if (arr2[j].match(/(http:)/))
{
urls += '<a class="url" href="'+arr2[j]+'">'+arr2[j]+'</a>';
}
else
{
spans += '<span>'+arr2[j]+'</span>';
}
}
}
Regexp . I thought it would be possible to look for the opposite in exp_all, as in any other than http. However, it is not.
var exp_img = /(https?:\/\/([\S]+?)\.(jpg|jpeg|png|gif))/g,
exp_link = /([^"])(https?:\/\/([a-z-\.]+)+([a-z]{2,4})([\/\w-_]+)\/?)/g,
exp_all = /^((?!http).)*$/g;
text = data.replace(exp_all, '<span>$3</span>');
text = text.replace(exp_img, '<a class="img" href="$1">$1</a>');
text = text.replace(exp_link, '<a class="url" href="$2">$2</a>');
So, the best way to do this plain text to HTML conversion would be appreciated. I would love if there was some kind of library for this. I looked at Markdown, but then I would still have to update the text for Markdown, so I don't think this is an option.
And if possible, I would like to remove "http: //" and make it as clean and tidy as possible.