idio has two offices in the UK, the first in Exeter and the second in London. As avid and curious Twitter users we'd like to find out whether the population of Exeter or London are better at spelling. By using the Twitter API, aggregate and analyse tweets from Exeter and London.
Assumptions
- Hashtags and @ replies contained in tweets can be ignored and should not have an impact on the overall spelling quality of a tweet.
- Decisions about how you classify what a tweet from London is and what a tweet from Exeter is are for you to make.
- Decisions about the scoring of individual tweets are for you to make.
Deliverables
- Full source code in PHP, Ruby or Python, along with instructions about how to run it for ourselves.
- Results of the analysis along with any comments you wish to make, and short description of the decisions you made whilst programming your solution, written in either Markdown or Textile
- Upload the source and description as a Gists on GitHub
One of the things we do at idio is to group articles, so that users who are interested in a particular topic can easily and quickly find a range of articles on that topic. Attached to this task, you will find a small sample of content. It is your job to write an algorithm - in PHP, Python or Ruby - that will cluster these articles by topic.
Deliverables
- The source code for your algorithm
- The results of running it on the sample data
- Document - in Markdown or Textile - which describes the algorithm you created, why you chose it and how it could be improved, and also talks about any other techniques which you discarded.
Upload all the source code, results and the write-up as a Gist on Github.
/* TASK 2 ARTICLE CLUSTERING */
$row['publish'], "content" => $row['content'], "source_url" => $row['source_url'], "author" => $row['author'], "title" => $row['title']); } $json = json_encode($rows); $callback = $_GET['callback']; echo $callback.'('. $json . ')'; ?>