C # library for web search and ftp scanner

I need a library (hopefully in C #!) That works as a web crawler for accessing HTTP files and FTP files. Basically, I am happy with reading HTML, I want to expand it to PDF, WORD, etc.

I am pleased with the older open source software, or at least any documentation guidelines.

+3
source share
2 answers

Check NCrawler project

A simple and highly efficient multithreaded web crawler with pipeline-based processing written in C #. Contains HTML, Text, PDF and IFilter document processors and language definition (Google). Easily add pipeline steps to retrieve, use, and modify information.

+4
source

I developed the Crawler Engine for the Crawler-Lib platform. This is a full-featured search robot that can easily spread to any queries or even to the processing that you want to have.

Here is the engine: http://www.crawler-lib.net/crawler-lib-engine

Here is a Youtube video showing how the Crawler-Lib engine works: http://www.youtube.com/user/CrawlerLib

I know that this project is not open source, but there is a free version.

+1

Source: https://habr.com/ru/post/1770170/


All Articles