Since Gmail does not provide an API to get this information, it looks like you want to make a web scraper .
Web site scraper (also called web data collection or web data extraction) computer software technique web site information extraction
There are many ways to do this, as mentioned in a wikipedia article previously linked:
Copying and pasting a person: sometimes even the best technology of web scraping can not replace the manual expertise of people and copy-paste, and sometimes it can be the only workable solution when sites for scraping are clearly installation barriers to prevent automation.
Grepping text and regex matching: A simple but powerful approach to extracting information from web pages can be based on the UNIX grep command or regex matching language tools (such as Perl or Python).
HTTP programming: static and dynamic Web pages can be obtained by publishing HTTP requests to a remote web server using socket programming.
DOM parsing: by embedding a fully functional web browser such as Internet Explorer or Mozilla Web browser management, programs can retrieve dynamic content created by client scripts. This web browser control also analyzes web pages in the DOM based on which programs can extract portions of web pages.
HTML parsers: some semi-structured Data Query Languages, such as XML Query Language (XQL) and Hypertext Query Language (HTQL), can be used to parse HTML pages and retrieve and transform web content.
Web scraping software: there are many Web scraping software available that can be used to customize web scraping solutions. These programs can provide a web recording interface that removes the need to manually write web cleanup codes, or some script functions that can be used to extract and convert web content, and database interfaces that can store cleared data in local databases.
Semantic annotation: Web pages can cover metadata or semantic markup / annotations, which can be used by snippets to define specific data. If annotations are embedded in pages, as Microformat does, this method can be considered as a special case of DOM parsing. In another case, annotations organized in semantic layer 2 are stored and managed separately by web pages, so web scrapers can get data schema and instructions from this layer before scraping the page.
And before proceeding, consider the legal consequences of all this. I donβt know if it meets the conditions of gmail, and I would recommend checking them before moving forward. You may also be blacklisted or run into other problems.
All of the above, I would say that in your case you need some kind of spider and DOM parser to enter gmail and search for the necessary data. The choice of this tool will depend on your technology stack.
Like ruby ββdev, I like to use Mechanize and nokogiri . Using PHP, you can take a look at solutions like Sphider .