We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Text is arguably the most pervasive—and certainly the most persistent—artifact of political behavior. Extensive collections of texts with clearly recognizable political—as distinct from religious—content go back as far as 2500 BCE in the case of Mesopotamia and 1300 BCE for China, and 2400-year-old political discussions dating back to the likes of Plato, Aristotle, and Thucydides are common fare even in the introductory study of political thought. Political tracts were among the earliest productions following the introduction of low-cost printing in Europe—fueling more than a few revolutions and social upheavals—and continuous printed records of legislative debates, such as the British parliament's Hansard and precursors tracing to 1802, cover centuries of political discussion.
Every good book has a small bit—a sentence, paragraph, maybe a page—that the authors intended as a simple aside but which brings an epiphany to the reader. In Brady and Collier (2004), this occurs at the beginning of chapter 3: Brady's critique of the “quantitative template,” where the recovering seminarian frames our discourse on the philosophy of social inquiry in terms of pragmatic theology and homeliletics, rather than science or sociology. Hey, that is it!—while this debate is not in any sense about religion, its dynamics are best understood as though it were about religion. We have always known that, it just needed to be said.
The ID3 algorithm is an inductive artificial intelligence technique that generates classification trees. These trees are similar to those used in simple expert systems; with ID3 they are generated by machine rather than using human experts. This article applies a bootstrapped ID3 to the Butterworth data set on interstate conflict management. By generating a number of classification trees from randomly selected subsets of the complete data set, the variables that most effectively predict the outcome of the conflict management effort are identified, and the degree of unpredictability in the data is estimated from the accuracy of the classification tree in predicting cases not in the training set. The original set of 38 independent variables can be reduced to 5 or less with almost no loss of accuracy; classification trees using these variables have 95–100 percent accuracy when fitted to the entire data set and an average accuracy of 50–60 percent in predicting new cases in split-sample tests. Unlike many existing statistical techniques, the classification tree is a plausible model of human inductive knowledge representation since it is compatible with the cognitive constraints of the human brain.
Due in large part to the proliferation of digitized text, much of it available for little or no cost from the Internet, political science research has experienced a substantial increase in the number of data sets and large-n research initiatives. As the ability to collect detailed information on events of interest expands, so does the need to efficiently sort through the volumes of available information. Automated document classification presents a particularly attractive methodology for accomplishing this task. It is efficient, widely applicable to a variety of data collection efforts, and considerably flexible in tailoring its application for specific research needs. This article offers a holistic review of the application of automated document classification for data collection in political science research by discussing the process in its entirety. We argue that the application of a two-stage support vector machine (SVM) classification process offers advantages over other well-known alternatives, due to the nature of SVMs being a discriminative classifier and having the ability to effectively address two primary attributes of textual data: high dimensionality and extreme sparseness. Evidence for this claim is presented through a discussion of the efficiency gains derived from using automated document classification on the Militarized Interstate Dispute 4 (MID4) data collection project.
By
John Beieler, Pennsylvania State University, john.b30@gmail.com,
Patrick T. Brandt, University of Texas, Dallas, pbrandt@utdallas.edu,
Andrew Halterman, Caerus Associates, ahalterman0@gmail.com,
Philip A. Schrodt, Parus Analytical Systems, schrodt735@gmail.com,
Erin M. Simpson, Caerus Associates, emsimpson@gmail.com
Political event data are records of interactions among political actors using common codes for actors and actions, allowing for the aggregate analysis of political behaviors. These data include both material interactions between political entities and verbal statements. Such data are common in international relations, recording the spoken or direct actions between nation-states and other political entities. Event data can be generated through either human-coded or machinebased methods. Human-coded event data efforts continue to dominate research on global protests and social movements, although data sets in international relations have led the movement toward automated coding. While humans are better able to extract the meaning in sentences using background knowledge and innate abilities for dealing with complex grammatical constructions, human coding is dramatically more labor and time intensive than machinecoding approaches for anything but small or one-off data sets. Machine-coded methods can attain 70–80% accuracy when compared to a human-coded “gold standard,” which is comparable to, and in some cases exceeds, the intercoder reliability of human coding (King and Lowe, 2004). This makes the machine-coded methods quite scalable in terms of costs and time and thus attractive to academic, government, and private sector researchers.
King (2011) notes that the ability to code and process political texts to generate records like event data will be de rigueur in the later part of the 21st century. Machine-readable text about politics, including news reports, speeches, press conferences, and intelligence reports, are already the basis of many political analyses. The ever-increasing availability of such texts presents both opportunities and challenges because they are a form of “big data.” Even processing just the lead sentences of Reuters and Agence France-Presse (AFP) news reports for the Levant from 1979–2011 generates more than 140,000 distinct time-series records (http://eventdata.parusanalytics.com/data.dir/levant.html), and these sentences could also be processed as a much larger set of network relationships. One recent effort to expand event data collection outside of this geographical region – albeit without the event de-duplication found in most event data sets – has generated nearly a quarter of a billion records. Extrapolating from our coding experience with the Levant and our initial experiments with the EL:DIABLO coding system described later, we estimate that a data collection with duplication controls like that for the Levant data set will generate around 4,000 to 8,000 distinct records per day for the entire globe.
The Behavioral Origins of War. By D. Scott Bennett and Allan
C. Stam. Ann Arbor: University of Michigan Press, 2004. 300p. $59.50
cloth, $24.95 paper.
An aphorism in the natural sciences states that one should either
write the first article on a subject or the last one. The statistical
study of war began in the 1930s and 1940s with the work of Lewis
Richardson and Quincy Wright, then expanded massively in the 1960s with
the Correlates of War project based at the University of Michigan.
Those were the first articles. This book is potentially the last
important one.
The department where I did my graduate training in the early 1970s was bitterly split between advocates of case-study and statistical approaches. At the time, both sides thought the other would fade away—statistical analysis was a fad; case studies, a relic from a prescientific past. But 30 years later, both methods persist, and the debate has recently intensified in response to King, Keohane, and Verba's (1994) assertion in Designing Social Inquiry that the methodology of case studies could be subsumed under that used in statistical research. The polite names for the two positions have changed—“case study” versus “large N” is more common now than the “traditional” versus “scientific” monikers of the 1960s and 1970s; the epithets—“slow journalism” versus “mindless number crunching”—remain much the same.
We use cluster analysis to develop a model of political change in the Levant as reflected in the World Event Interaction Survey coded event data generated from Reuters between 1979 and 1998. A new statistical algorithm that uses the correlation between dyadic behaviors at two times identifies clusters of political activity. The transition to a new cluster occurs when a point is closer in distance to subsequent points than to preceding ones. These clusters begin to “stretch” before breaking apart, which serves as an early warning indicator. The clusters correspond well with phases of political behavior identified a priori. A Monte Carlo analysis shows that the clustering and early warning measures are not random; they perform very differently in simulated data sets with similar statistical characteristics. Our study demonstrates that the statistical analysis of newswire reports can yield systematic early warning indicators, and it provides empirical support for the theoretical concept of distinct behavioral phases in political activity.
The computer has revolutionized the use of statistical techniques in social science research. In recent years microcomputers have begun to replace large mainframe computers in such applications as word processing, accounting and data base management. While the use of microcomputers as statistical processors is still in its infancy, and existing software leaves a lot to be desired, the economics and convenience of statistical work on microcomputers point to an increasing role for micros in this area.
This article will survey the costs and benefits of doing statistical work with microcomputers. It is not a discussion of individual statistical programs but instead focuses on the general issues of what you need and what you can expect to do.
Lewis Frye Richardson's simple differential equations model of armaments races has been long criticized for its lack of incorporation of the goals of nations. Using the mathematics of optimal control theory, the authors formulate a model which incorporates national goals into an “arms balance” objective function. The goals used are based on the traditional concerns in the balance-of-power literature. From an objective function together with the Richardson model an optimal armaments policy is derived. The United States-Soviet, NATO-WTO, and Arab-Israeli arms races are used as empirical examples, and the parameters in the model are estimated by means of functional minimization techniques. The optimal control model is further examined for its equilibrium and stability properties. The equilibrium and stability conditions are assessed with respect to the empirical examples. The findings are that while the United States and the Soviet Union in direct confrontation pursue strategies that lead to a lack of equilibrium and stability, when taken as part of NATO and WTO, the major powers and their alliance partners do pursue stable and equilibrium strategies. The Israeli policy is found to lead to equilibrium and stability while the Arab policy does not.
Recommend this
Email your librarian or administrator to recommend adding this to your organisation's collection.