Cool Summer 2012 Internship Problem Set 5

The Cool Summer Internship Race is getting closer and closer to the finish line. A big thumbs-up for all those who have already sent their solutions to previously posted problems and a piece of advice to those willing to do so: hurry up, we only have 2 sets of problems left! But we still have positions to fill. 🙂

This week is going to be more social. 🙂

Do not forget about the rules. You can solve the problem in a small team! The deadline to submit your solutions is Wednesday, May 2nd, 10 am GMT. Results must be submitted here.

Connect People

You are designing a new social network. Unlike social networks where people connect by friendship request, your social network connections are rendered automatically, based on mutual interest.
What you know is:

Each person has an associated list of interests O1… Oi
The list is not static (people can add interests or subscribe to other interests that had been previously added by others)
The more common interests between people, the better the chances to have them connect. On the other hand, when two people are connected based on mutual interest, their connections up to level three are to be considered possible candidates for connection.

What you have to do

Describe a data storage model. How would you store the list of people, their interests and the connections?
Describe an algorithm for discovering best connections, based on mutual interests and existing friendships.

Suggestions

You will store a large amount of data, comprising information on millions of people and millions of interests.
You should be able to retrieve potential connections in real time from a friendly data structure.
Most likely you cannot identify potential connections in real time! So you should have some workers to process the information. It’s best to design an algorithm that can be easily distributed across multiple workers.
Consider the weight! Some connections are better than others (you should settle your own criteria). What’s more important for you?

Wikipedia Test

You joined a project to find quality issues with Wikipedia web infrastructure (not content). Your job is basically to discover bugs in Wikipedia. Any type of bug is appreciated.
You are not aware of formal testing procedures, but you must organize your work in order to find problems of significant importance.

How do you plan to do it?

What you have to do

Write a plan for the task.
Show us how you would execute a part of this plan.

Suggestions

In testing, automation is important. Wikipedia is web based; do you see a black box?
There are many types of potential issues. An important issue might be more valuable than one hundred minor findings.
Security is very important. Make sure you cover it!
Wikipedia is one of the busiest websites on the Internet. How would you stress it to see how the current infrastructure scales?

We wish you a great weekend and we look forward to receiving your creative solutions.

2 Comments

You can post comments in this post.

Hello,

I have a question about the Connect People problem from Problem set 5.

I understand that the more common interests are between 2 people (for example), the greater the chances that these 2 people will get connected. Now, this sentence is bugging me: “On the other hand, when two people are connected based on mutual interest, their connections up to level three are to be considered possible candidates for connection.”

What does “connection up to level three” mean? Maybe I am missing something…

Tank you,
Have a very nice day!

Vali 13 years ago Reply
Hi,

You are connected to John. John is a first level connection to you.
John is connected to Anne and Anne is not connected to you. Anne is a second level connection to you.
Anne is connected to Emily and Emily is not connected to John or you. Emily is a second level connection to John and a third level connection to you.

Hope this is clear.

Blog wizard 12 years ago Reply