The Cool Summer Internship Race is getting closer and closer to the finish line. A big thumbs-up for all those who have already sent their solutions to previously posted problems and a piece of advice to those willing to do so: hurry up, we only have 2 sets of problems left! But we still have positions to fill. 🙂
This week is going to be more social. 🙂
Do not forget about the rules. You can solve the problem in a small team! The deadline to submit your solutions is Wednesday, May 2nd, 10 am GMT. Results must be submitted here.
You are designing a new social network. Unlike social networks where people connect by friendship request, your social network connections are rendered automatically, based on mutual interest.
What you know is:
- Each person has an associated list of interests O1… Oi
- The list is not static (people can add interests or subscribe to other interests that had been previously added by others)
- The more common interests between people, the better the chances to have them connect. On the other hand, when two people are connected based on mutual interest, their connections up to level three are to be considered possible candidates for connection.
What you have to do
Describe a data storage model. How would you store the list of people, their interests and the connections?
Describe an algorithm for discovering best connections, based on mutual interests and existing friendships.
You will store a large amount of data, comprising information on millions of people and millions of interests.
You should be able to retrieve potential connections in real time from a friendly data structure.
Most likely you cannot identify potential connections in real time! So you should have some workers to process the information. It’s best to design an algorithm that can be easily distributed across multiple workers.
Consider the weight! Some connections are better than others (you should settle your own criteria). What’s more important for you?
You joined a project to find quality issues with Wikipedia web infrastructure (not content). Your job is basically to discover bugs in Wikipedia. Any type of bug is appreciated.
You are not aware of formal testing procedures, but you must organize your work in order to find problems of significant importance.
How do you plan to do it?
What you have to do
Write a plan for the task.
Show us how you would execute a part of this plan.
In testing, automation is important. Wikipedia is web based; do you see a black box?
There are many types of potential issues. An important issue might be more valuable than one hundred minor findings.
Security is very important. Make sure you cover it!
Wikipedia is one of the busiest websites on the Internet. How would you stress it to see how the current infrastructure scales?
We wish you a great weekend and we look forward to receiving your creative solutions.