Multidimensional Scaling for Linguists Using Optimal Classification



Multidimensional scaling (MDS) is a technique for visualizing the relationships among data that are similar to each other on very many dimensions. Much linguistic data, particularly data on variation across grammatical or other contexts and across languages, is of a form amenable to MDS. In particular, complex crosslinguistic data that is difficult to represent using semantic maps can be analyzed using MDS. Optimal Classification is an algorithm created by Keith Poole for MDS analysis in political science; we applied it to linguistic data in Croft and Poole, "Inferring universals from grammatical variation: multidimensional scaling for typological analysis" (Theoretical Linguistics 34.1-37, 2008). Jason Timm has adapted Poole's R code for a uniform input of linguistic data and output of a range of analytical tools, including tables and graphs of the analyzed data. We make this code available here. If you have comments or bug reports, please contact wcroft (at) unm (dot) edu, or jtimm (at) unm (dot) edu.


MDS for Linguists: User Guide (PDF, revised 11/23/13). This document describes what type of linguistic data MDS can be used for, how to use the R programs, and how to interpret the results of the MDS analyses.

OC Script for Linguists (R code, version 11/16/13). Update 11/16/13: This version corrects a bug that arose in the Mac version of the program. It also allows the user to control the minimum number of categories and the lopsidedness of categories included in the MDS analysis (see the revised User Guide for explanation).

Script for Displaying a Subset of Cutting Lines (R code, version 7/15/13)

Sample data file of indefinite pronouns (with thanks to Martin Haspelmath)

Keith Poole's Optimal Classification web page