- How media works
- There's a difference in positioning: in-depth vs breaking news
- Crunch in talent, margin pressures. Not enough staff to 'break news'
- Sources of breaking news: agencies, in-house, competition, social media
- Increasingly, social media is a dominant source
- How can we source social media data at scale
- Twitter vs Facebook vs Google Trends vs ...: accessibility vs reach
- Streaming in real-time (importance of sub-second responses for TV)
- Parallel extraction: Sockets & threads -- importance of async (and why node.js is better than Python 2)
- Storage: JSON and coming of age of RDBMSs (and why Postgres is as good as MongoDB)
- Distributed scraping -- building a headless browser farm
- Client-side scraper farms as alternatives -- building Chrome plugins
- Filtering sources for insights
- Why traditional entity extraction fails
- Fuzzy matching in the Indian context: key-collision vs distance-based methods
- How visuals help flexibly identify topic clusters -- k-means and beyond
- Determining the importance and relevance of topics
- Manual vs automated filtering -- negative-lists
- Structure of the final solution -- what it looked like, and what it resulted in
Last active
August 29, 2015 14:23
-
-
Save sanand0/a59297dc11925b4b1491 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment