Intelligent web functions, algorithms (people you can follow, similar to you ...)

I have 3 main questions about algorithms in an intelligent network (web 2.0)

Here is the book I'm reading http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665 and I want to study the algorithms more deeply

1. People you can follow (Twitter)

How to determine the closest result for my queries? Data collection? what algorithms?

2. How did you connect the function (Linkedin)

It’s just that the algorithm works like this. He draws a path between two nodes between which affects me, and the other - C. Me → A, B → A connections → C. These are not any brute force algorithms or any other similar graph algorithms :)

3. Like you (Twitter, Facebook) These algorithms are similar to 1 .. Does it just work with max (counter) friend (facebook) or max (counter) on Twitter? or any other algorithms that they implement? I think the second part is correct as the loop is running

dict{count, person} for person in contacts: dict.add(count(common(person))) return dict(max) 

- a stupid act on every page that is updated.

4. You mean (Google) I know that they can implement it with the phonetic algorithm http://en.wikipedia.org/wiki/Phonetic_algorithm just soundex http://en.wikipedia.org/wiki/Soundex , and here Google VP of Engineering and CIO Douglas Merrill say http://www.youtube.com/watch?v=syKY8CrHkck#t=22m03s

What about the first 3 questions? Any ideas are welcome!

thanks

+4
source share
4 answers

People you can follow

You can use calculations based on factors:

 factorA = getFactorA(); // say double(0.3) factorB = getFactorB(); // say double(0.6) factorC = getFactorC(); // say double(0.8) result = (factorA+factorB+factorC) / 3 // double(0.5666666666666667) // if result is more than 0.5, you show this person 

So say in the case of Twitter, “People you can follow” can be based on the following factors (user A is the user who views this function “People you can follow” may have more or less factors):

  • Relativity between frequent keywords found in User A and User B tweets
  • Relativity between the profile description of both users
  • Relativity between location of users A and B
  • Are users of User A following user B?

So where do they compare "People You Can Follow"? The list probably came from a combination of people with a large number of followers (they are probably celebrities, alpha geeks, famous products / services, etc.), and [the people that user A follows] follow.

Basically, a certain level of data mining should be defined here, reading tweets and biography, calculations. This can be done on a daily or weekly cron job, when the server load is less than a day (or maybe 24/7 on a separate server).

How did you connect

This is probably smart work here so that you feel that a lot of brute force has been done to determine the path. However, after some surface research, I find this simple:

Say you are user A; User B is your connection; and user C is the connection of user B.

In order for you to visit User C, you first need to visit user B's profile. By visiting user B's profile, the website already saves information indicating that user A is in user B's profile. Therefore, when you visit user C from user B, the website immediately informs you that “User A → User B → User C”, ignoring all other possible paths.

This is the maximum level that user C has, Acannot continues to browse his connections until User C connects to user A.

Source: LinkedIN Observation

Like you

This is the same as # 1 (People you can follow), except that the algorithm is read in a different list of people. The list of people the algorithm reads is the people you stick to.

You meant

Good thing you got it right there, except that Google probably used more than just soundex. There is a language translation, word replacement and many other algorithms used for Google. I can’t comment much because it is likely to be very complicated and I am not a specialist in language processing.

If we research a bit more on the Google infrastructure, we may find that Google has servers dedicated to spelling and translation services. You can learn more about the Google platform at http://en.wikipedia.org/wiki/Google_platform .

Conclusion

The key to heavily enhanced algorithms is caching. After caching the result, you do not need to load every page. Google does it, Stack does it (on most pages with a list of questions) and Twitter is not surprising!

Algorithms are mainly determined by developers. You can use other algorithms, but in the end, you can also create your own.

+7
source

People you can follow

May be one of many types of recommendation algorithms, perhaps joint filtering ?

How are you connected

This is just the shortest path algorithm on a social graph. Assuming there is no weight for the compounds, he will just use breadth-first .

Like you

Just rearrange the dataset using the same algorithm as People you can follow .

Check out the Collective Intelligence Programming book for a good look at the type of algorithms that are used for People you can follow . Like you , it also has great python code.

+2
source
  • People You Can Follow From the Twitter Blog - “offers are based on several factors, including the people you follow and the people they follow” http://blog.twitter.com/2010/07/discovering-who-to -follow.html Therefore, if you follow A and B, and they both follow C, then Twitter will offer you C ...
  • How did you connect the function ? I think you answered this question.
  • Like you As above, and as you say, although the results are probably cached - so do it only once per session, or perhaps even less ...

Hope this helps, Chris

+1
source

I do not use twitter; but with this in mind:

1). At first glance, this is not so difficult: for every person that I follow, see Who they follow. Then for each of the people they follow, see Who They Follow, etc. The deeper you go, the more crunches that are required.

You can take this a little further if you can also effectively draw the opposite: for those whom I follow, who also follow them?

In both cases, what is unsaid is a way to weigh tweeters to see if they are who I really would like to follow: a liberal follower can also follow a conservative tweeter, but that does not mean that I would like to follow a conservative ( see No. 3).

2). Not sure thinking about it ...

3). Assuming bio and tweets are the only thing to do, the hard parts are:

  • Determining which attributes should exist (political affiliation, topic types, etc.).
  • Clear each of 140 characters before entering data.

When you have the right set of attributes, then two different algorithms come to mind:

  • K means clustering to determine which attributes I tend to recognize.
  • N-nearest neighbor to find the N most similar tweeters for you, given the attributes that I tend to put weight on.
  • EDIT: Actually, a decision tree is probably the best way to do all this ...

It's all speculative, but it sounds fun if someone gets paid for it.

+1
source

Source: https://habr.com/ru/post/1332072/


All Articles