University rankings are becoming more popular, but at the same time also more controversial. Apart from various methodological shortcomings, a major issue concerns the global nature of most university rankings. “Global” means that entire universities are ranked on global “quality”, most often starting from more specific indicators. The two most prominent ranking systems, Shanghai and THES, and most other ranking systems still provide a global ranking as their most important result, but the criticism is growing. As a result, multidimensionality is becoming more and more the norm. For example, when the EC launched its call for tenders to construct university rankings, the first feature mentioned in the press release of the 11th December 2008 was that this new type of ranking is “multidimensional”, referring to education, research, innovation, internationalisation and community outreach. The tender was won by the CHERPA-network consortium from four European countries, including the German CHE, an explicit proponent of multidimensionality.
Although the skepticism about global rankings is growing among experts, most likely this skepticism will not make them go away. In all kinds of public domains, including sports and business, global rankings seem to fulfill a need. When they do not exist, they are created, such as when during the Olympics the number of gold medals is counted to make rankings over sport disciplines. Thinking in overall terms predominates in all kinds of domains. Nuances are interesting, but they are nuances. When global rankings do exist, they seem to be functional indeed. They enhance the motivation to improve, they are a basis for choices to be made, and they are a source of self-esteem and status for those who are associated in some way with top-ranked business, schools, sport teams, etc. Specific rankings can fulfill a similar role, but they seem less influential.
Several arguments are used against global rankings. A first argument is that the “quality” of universities is a multidimensional given. Universities have multiple tasks, such as research, teaching, and service to the community, and each of these tasks is fulfilled in different scientific domains, so that the quality can differ depending on the task and the scientific domain one looks at. A second argument is that most so-called global rankings are actually based on a limited kind of aspects, and therefore have only a limited scope. For example, if no relevant indicators of teaching quality are included, the rankings are at best only limited. A third argument is that universities are diverse, so that a common ranking system is not appropriate. Some universities call themselves “research intensive”, and others call themselves “universities of technology”. Although these terms do not contradict, they imply a somewhat different focus. In a similar way the focus on research and teaching may differ. A different focus would require a different set of indicators.
Although all three critical arguments are valid to some extent, they are perhaps not of the kind to reject global ratings a priori. First, the multidimensionality argument is implicitly based on a kind of measurement model called “reflective”, with indicators seen as reflective of a common underlying “quality”. Divergence of indicators and hence, multidimensionality, contradicts a common underlying “quality” and therefore invalidates the measurement. However, measurement can be formative instead of reflective, with “quality” being measured as the formative (i.e., cumulative) effect of the indicators. In the formative view, “quality” is seen as a result, not as an intrinsic characteristic at the origin of a result. Formative measurement does not require that indicators converge. According to this view, “quality” is not in the nature of a university, it is in the results a university obtains. A formative concept of “quality” is attractive also from a pragmatic point of view. If “quality” is defined as a combination of results, it is easier to improve than when it is defined as an underlying characteristic, and endless discussions can be avoided about what “quality” means.
Second, the limited-scope argument implies that a measure is valid only if its scope is not limited in comparison with the concept it is supposed to measure. Suppose that a course consists of four chapters and that a test is given at the end of the course. It is good practice that the test covers all four chapters, but it is not a necessary condition. Any measure, even a very limited one, with a high correlation to the overall degree of mastery would be a valid measure as well. Whether a measure may be considered valid as a global measure, is an empirical rather than a conceptual issue. It is true, however, that the empirical validation itself would require an approach with a broader scope.
Third, the diversity argument is based on the belief that universities differ in what their aims are. Although this may seem evident, it does not follow from documents such as those produced by the EUA and the Coimbra group, and also not from nation-wide systems for university funding based on a common set of parameters. Instead, most universities have about the same ambition, and strive for excellence in research, teaching, and service to society. Diversity is an interesting concept, but the reality, at least in Europe, is that most universities have about the same ideals.
Of course, global rankings do not tell the whole truth, and more specific rankings tell more specific truths. They are both meaningful ways to evaluate universities. Moreover, the three arguments cannot decide between global and specific. They apply to specific rankings as well as to global rankings. Any kind of ranking is based on indicators which diverge to some extent, the set of indicators can always be extended to be more complete, and diversity is an omnipresent concept, in research as well as in teaching. Rankings can always be decomposed into more specific rankings, like Russian nesting dolls, so that in the limit an infinite number of very specific rankings would be obtained, and, ironically, a strong need for an overall view would be felt.
As a counterfactual and pragmatic test of the concept of global rankings, I have analyzed the six THES indicators and the six Shanghai indicators over several years, using a multidimensional scaling analysis, which is based on reflective measurement. The result is an empirically based map of the two times six indicators over several years, and 200 universities in a common multidimensional space. The 12 indicators over the years were all found on a bow at the top of the major dimension, as stars in the sky. This result is not evident given the rather different nature of the indicators. The major dimension may therefore be considered the global “quality” dimension. The closer to the “stars” (the indicators), the better the universities are. Interestingly, a slight diversity was found among the universities, as some of them approach the sky from the left and others from the right. The former are mainly “technological” universities, whereas the latter are mainly “academic” universities. The results show that, as far as the 12 indicators are concerned, their multidimensionality does not preclude a dominant global “quality” dimension, so that global rankings can be meaningful, even when using reflective measurement.
by Paul De Boeck, vice-president for research K.U.Leuven, January 9, 2009.
Paul De Boeck has a PhD in psychology, is a full professor at the University of Amsterdam, and is the previous vice-president for research at the K.U.Leuven (Belgium). His research area is statistical modeling of behavioral data, and model-based measurement of psychological traits and educational achievements. He is past president of the Psychometric Society (2007-2008).
Categories: Leadership in Education