I’ve had a number of interesting conversations with various groups engaged in natural language processing (or NLP) over the past few weeks. These having included Harvard University’s Institute for Quantitative Social Science (IQSS), Harvard’s Berkman Center on Internet and Democracy, the European Commission’s Joint Research Center (JRC) and the private sector company Virtual Research Associates (VRA). There is a lot of very interesting (potentially groundbreaking) work being carried out in the field of NLP, particularly given the shift towards a more generative Internet (see Zittrain’s new book on the Future of the Internet and Benkler’s excellent piece on The Wealth of Networks for some fascinating insights on the generative Internet).
So I thought I’d use this blog entry to list some of the leading papers/articles on the topic:
- Hopkins, Daniel and Gary King. 2008. “Extracting Systematic Social Science Meaning from Text,” Institute for Quantitative Social Science, Harvard University. [PDF]
- Tanev, Hristo; Jakub Piskorski and Martin Atkinson. 2008. “Real-time News Event Extraction for Global Monitoring Systems,” Joint Research Center of the European Commission, Web and Language Technology Group of the Institute for the Protection and Security of the Citizen (IPSC). [PDF]
- Piskorski, Jakub; Tanev, Hristo; Martin Aktinson and Erik van der Goot. 2008. “Cluster Centric Approach to News Event Extraction,” Joint Research Center of the European Commission, Institute for the Protection and Security of the Citizen (IPSC). [PDF]
- King, Gary and Will Lowe. 2003. “An Automated Information Extraction Tool for International Conflict Data with Performance as Good as Human Coders: A Rare Events Evaluation Design,” International Oganizations, 57, Summer 2003: 617-642. [PDF]
- Bond, Doug; Bond, Joe; Oh, Churl; Jenkins, Craig and Charles Taylor. 2003. “Integrated Data for Events Analysis (IDEA): An Event Typology of Automated Events Data Development,” Journal of Peace Research, 40(6): 733-745. [PDF]
I would be happy to continue adding to this list so if anyone has any recommendations for additional references, please don’t hesitate to contact me.