Inverted Index in C # Shared Collections

(Sorry if the name is a red herring, by the way)

Background:

I am developing a map of all the tweets in the world in real time using the Stream Streaming API and ASP.NET SignalR. I use the Twitterinvi C # Twitter library to asynchronously translate tweets to the browser using SignalR. Everything works as expected - see http://dev.wherelionsroam.co.uk for an idea of ​​this.

The next development step involves analyzing all the textual data of the tweet using the Parsing library in the Stanford Natural Language ( http://nlp.stanford.edu/software/corenlp.shtml ), in particular (also called CRFClassifier) ​​so that I can extract meaningful metadata from each tweet (e.g., people, places, and organizations mentioned). The desired result is that I can identify the people, places and organizations that many people talk about (similar to the "Trend" concept) and transfer them to all clients using SignalR. I know that the Twitter API has methods GET trends, but that would not be fun, would it ?!

Here are the main classes in my application:

Main classes:

TweetModel.cs (contains all the tweet information passed to it from the Streaming API):

public class TweetModel
{
    public string User { get; set; }
    public string Text { get; set; }
    public DateTime CreatedAt { get; set; }
    public string ImageUrl { get; set; }
    public double Longitude { get; set; }
    public double Latitude { get; set; }
    public string ProfileUrl { get; set; }

    // This field is set later during Tokenization / Named Entity Recognition
    public List<NamedEntity> entities = new List<NamedEntity>();
}

Abstract class NamedEntity:

public abstract class NamedEntity
{
    /// <summary>
    /// Abstract modelling class for NER tagging - overridden by specific named entities. Used here so that all classes inherit from a single base class - polymorphic list
    /// </summary>
    protected string _name;
    public abstract string Name { get; set; }
}

Person, , NamedEntity:

public class Person : NamedEntity
{
    public override string Name
    {
        get
        {
            return _name;
        }
        set
        {
            _name = value;
        }
    }
    public string entityType = "Person";
}

TweetParser:

 public class TweetParser
    {
        // Static List to hold all of tweets (and their entities) - tweets older than 20 minutes are cleared out
        public static List<TweetModel> tweets = new List<TweetModel>();
        public TweetParser(TweetModel tweet)
        {
            ProcessTweet(tweet);
            // Removed all of NER logic from this class
        }
}

:

, NER, , , "PERSON" "Luis Suarez" "PLACE" "New York". NamedEntity, , NER ( PERSON, LOCATION, ORGANISATION)

:

, , " " ( , ), NamedEntity ( List<NamedEntity>, TweetModel), " " , TweetModel > List<NamedEntity> . , , , !

:

enter image description here

, ; , ! src . https://github.com/adaam2/FinalUniProject

+4
2

1- List<TweetModel> NamedEntity.

public abstract List<TweetModel> Tweets { get; set; }

2- , Tokenization NamedEntity .

3, NamedEntity , TweetModel NamedEntity.

Person p = this is the result of the Tokenization;
entities.Add(p);
p.Tweets.Add(this);

, , , , " " " " .

+1

Person, .

, , , , . "" si MyHashFunctionForPerson

-:

Dictionary<string,List<Person>> map = new Dictionary<string,List<Person>>();

List<Person> FindMatches(Person p)
{
  string h = MyHashFunctionForPerson(p);
  if (!map.ContainsKey(h))
    map[h] = new List<person();
  map[h].add(p);
  return map[h];
}

MyHashFunction NamedEntity. Equals, GethashCode ..

, . , . , .

+1

Source: https://habr.com/ru/post/1547146/


All Articles