Perceptual dialectology in action: the case of stigmatized Geg [1]

by Jeta Alla*, Dijon Ismaili*, Enkeleida Kapia**, Sonja Krasniqi*, Sara Ymeri*[2]



The investigation of how language attitudes and ideologies are linked to speech in particular geographic regions, which in linguistics is called perceptual dialectology, is often used as a tool to tap into the social meaning of regional linguistic variation (Preston 1989). While research in perceptual dialectology has never been conducted for Albanian, lots of research in perceptual dialectology has been done for other languages. For example, it has been evidenced from perceptual dialectology studies in American English that subjects divide the country into the “stigmatized South” and “the normative North” (Hartley and Preston 1999; Preston 1996, 1999; Bucholtz et al. 2008). More specifically, the sociolinguistic boundary in the United States is between two big conceptual regions: people associate the more prestigious speech in America with Northern dialects, while low-prestige speech comes from the South. In other words, the Northern dialects are “good” and the Southern ones are “bad”, creating thus the perception of two large geographic areas (Hartley and Preston 1999; Preston 1996, 1999; Bucholtz et al. 2008). The map in Figure 1 from Cramer (2016) is an illustration of these studies and shows how people from Louisville perceive Maine, Maryland, Ohio, Colorado, Nevada and Washington, DC, to be beacons of correctness, while most of the Southern states, West Virginia, and New Jersey are rated rather poorly.

Figure 1. Louisvillians’ correctness ratings from 0 to 9 (with 9 being the most correct), adopted from Cramer (2016)

While language ideologies concerning major and well-funded languages have been investigated to some extent, little to no work has been done to understand these phenomena in languages with fewer speakers, such as Albanian. Our article presents results from a survey given to individuals born and raised in Albania, who were asked about their dialect attitudes towards different linguistic varieties within the country. In order to offer a more accurate sociopolitical analysis of attitudes about linguistic diversity in Albania, we utilized quantitative sociolinguistic methods. More broadly, our study enriches the investigation of language attitudes and ideologies within sociolinguistics, social psychology, and linguistic anthropology (e.g., Blommaert 1999; Jaworski et al. 2004; Kroskrity 2000; Niedzielski and Preston 1999; Woolard et al. 1998) with data from smaller language varieties within Abanian, i.e. Tosk (spoken in the south) and Geg (spoken in the north).

The relevance of a study of this kind is not just theoretical. It is also relevant within Albanology because it seeks to understand perceptual evaluations within Albania, which sometimes transcend even the boundaries of the country, as in the movie “Taken” (2008) where members of the Albanian mafia are depicted as speaking the Geg variety. Interestingly, this pattern was also observed in a recent sociolinguistic study (Morgan, 2015:95) which showed that “categories of Geg and Tosk represent not only linguistic division, but also social and geographic division.” Morgan (2015, 2017) highlights that images of the rural, undeveloped, and isolated Geg and the urban, developed, and institutionally powerful Tosk that is connected to the “outside world” have existed since Ottoman times and persist today in the form of what Agha (2007) calls “characterological figures”. Indeed, some of the subjects in her study are quoted to have labeled Geg as “e folme e trashë” (thick, uncultured, stupid), while Tosk is “e folme e butë” (soft, cultured, elegant). Importantly, Morgan (2015) also found that people perceived speech from Central Albania (Shqipëria e mesme) to be different from Tosk and Geg, certainly to be less e trashë than Geg, suggesting thus that people perceive the existence of three major varieties of Albanian, and not two.

While Morgan’s research through qualitative methods offers a suggestive starting point for the discussion of language attitudes among Albanians, our study aims to investigate this topic further using empirical methods with a large subject pool.


The research design for this project utilized the overarching techniques of perceptual dialectology, as formulated by Preston (e.g. 1989, 1993). Four graduate students and their instructor conceptualized, designed the study and the survey and collected the data, as part of the final assignment for the Variationslinguistik (Variational Linguistics) course in their Albanology Masters at Ludwig-Maximilians University in Munich. The data were coded, analyzed, interpreted and written about in the form of an article as part of an Advanced Albanology course within the same MA program the following semester. The research instruments involved a rating survey of all 12 districts of the Albanian-speaking territory within the Republic of Albania (as seen in Figure 2) on two scales from 1 to 10: pleasantness and correctness, with 1 being the least pleasant/correct and 10 being most pleasant/correct. Every student fieldworker collected the completed maps and surveys from 30 respondents who were also required to answer sociodemographic questions on gender, age, place of birth and place of residence accompanied by the number of years in each of these locations (this latter information was used to compute the dialectal origin of each participating subjects).

A requirement was that the subjects were born and raised in Albania and did not emigrate to another country. Their task was to fill out two blank maps of Albania with the 12 geographical divisions, as shown in the map below[3] (Figure 2), with numbers from 1 to 10 in terms of how correct and pleasant they perceived the speech of these communities to be.

Figure 2. Map of geographical division (according to the 12 current counties as determined by Albanian law in 2014)

After completion of the survey, we obtained 248 completed maps, 124 for correctness and 124 for pleasantness,but here we will only present data from the correctness scale in order to stay within the length suggestions of this online publication.

Our quantitative analysis was performed using R software (2023). Demographics of our subject pool are provided in Figures 3 – 5. The reason for having more young people (see Figure 3) could be related to the fact that the data was collected online and this population group is readily reached via online surveying. We have no explanation as to why we have more females than males in our subject pool (see Figure 4), though this is not uncommon for studies of this kind. We also have no solid explanation as to why more Geg people participated in our survey, though in another study conducted at LMU a similar pattern was noticed (Riverin-Coutlée, Kapia and Gubian, in preparation). These discrepancies in number, though, were taken into account in our analysis.

Figure 3. Number of subjects by age group

Figure 4. Number of subjects by gender

Figure 5. Number of subjects by dialectal origin

Results and discussion

Before we interpret the results in Figure 6, we need to explain that we have grouped the 12 current counties of Albania into three main groups for reasons of presentation, i.e. central Albania (Tiranë, Durrës, Elbasan), north Albania (Shkodër, Dibër, Lezhë, Kukës) and south Albania (Gjirokastër, Korçë, Vlorë, Fier, Berat). Another reason for doing so was to also verify whether Morgan’s (2015) suggestion about a three-way division holds true in our data. In addition, it should be noted that the black line in the middle of each boxplot represents the median value per group.

Figure 6. Boxplot of the perception scores of correctness for the three geographical areas based on the subjects’ origin

The main observation for Figure 6 is that our subjects gave the highest score for correctness to the variety spoken in central Albania and the lowest score went to that spoken in north Albania. The second observation is that speakers of both Geg and Tosk origin perceive these three areas in the same way. In other words, speakers of Geg origin think of themselves as speaking the least correct variety, whereas speakers of Tosk think of themselves as speaking a more correct variety than Geg. Both Geg and Tosk speakers judge as the most correct variety that spoken in central Albania. This picture is very similar to what we described above in the American English landscape scenario, and it also illustrates the phenomenon of linguistic insecurity (Labov, 1966) among speakers of Geg who rate their local speech as incorrect, but rate other varieties as more correct. It also supports the idea presented in Morgan (2015) about a three-way division in the Albanian speaking territory and replicates her results about how people perceive speech spoken within Albania.

Figure 7. Boxplots of the perception scores of correctness for the three geographical areas based on the subjects’ gender

Figure 7 shows us a similar distribution as the one in Figure 6. That is, our subjects considered as least correct the variety spoken in the northern area and as the most correct the variety spoken in the central area. This is true for both males and females. However, it is noticeable here that Geg male speakers, when compared to Geg female speakers, think of their own variety as being less correct.


Figure 8. Boxplots of the perception scores of correctness for the three geographical areas based on the subjects’ age

For reasons of presentation, we have grouped our subjects into four age groups, as is standard practice in studies of this kind. At first glance, it seems that the distribution in Figure 8 is the same as in Figures 6 and 5. Essentially, all age groups give the highest score to the variety spoken in central Albania, followed by the variety spoken in south Albania, whereas the northern variety receives the lowest score again. However, a few things should be noted about this plot. First, the yellow boxes, which represent the older age group (65 years old and above), are characterized by less variability. This could mean that members of this age group have very set ideas about each of these geographical areas. The most varied group is the cyan age group (30 – 39 years old) which sometimes has scores that vary from 2 to 10 as is the case for central Albania. Second, we can see that the median values for central Albania and north Albania varieties are similar between all age groups, whereas the median values for the south Albania variety differ depending on how old people are. For this geographical area (south Albania), it is the youngest people (18 – 29 years old), portrayed by the orange box, that have the lowest score for correctness. In other words, they rate the variety spoken in the south as less correct than the older age groups, showing that they are less influenced by dominant language ideology practices. In fact, the mean score for the variety spoken in the south and the variety spoken in the north is quite similar for the youngest age group (18 – 29 years old), again possible evidence that they are less guided by language ideology. But this should not overshadow the fact that even the youngest people think of central Albania as speaking the most correct variety. This is also true for the oldest age group, but differently from the youngest age group, they rate the variety spoken in the south as more correct than the variety spoken in the north, most likely because they were affected more by ideological practices.

We think that the youngest group does not rate northern and southern speech so differently, not only because ideology practices may not have been as present in their lives, but also because modernization of Albanian culture has opened up opportunities for all language varieties to participate in public life through media, music, pop culture, literature, cinema, population mobility, etc., after the fall of communism. But it could also be that youth may also idealize more the urban lifestyle of Tirana, and speech may be one of the features associated with this lifestyle.

In conclusion, the main highlight from this study is that our subjects, regardless of origin, gender and age, consider the variety spoken in central Albania (Tiranë, Durrës, Elbasan) to be the most correct variety. This is most likely linked to the fact that this area is viewed perceptually as the most developed hub (politically, economically and culturally) where people not only like to move to live in, but also think that people within this desirable location speak more correctly. Based on our results we suggest that pedagogical practices in Albanian schools should stress that all varieties are correct and are an important part of the diverse grammatical system of Albanian. Our different and diverse ways of speaking (all varieties) should be celebrated and not stigmatized.


*affiliated with LMU Munich
** affiliated with LMU Munich/Akademia e Studimeve Albanologjike

We thank our subjects for their voluntary participation in this study. We are also grateful for the insightful feedback that Dr. Josiane Riverin-Coutlée gave us on an earlier version of the paper.


