(All code used to create this post is available at github)
I was recently reminded of a post from back in 2014 by Mikael Huss that he wrote after the Swedish elections in the same year. In it he did some data analysis on the results from the municipal elections as well as looking at it in the light of a number of different variables about the municipalities. He uses principal component analysis (PCA) and random forest predictive modeling to look at the data. The full blog post is available here. I do recommend checking it out since it’s an interesting read!
To begin with I’d like to apologize for the mix of languages in this post. The data source for this post are in swedish, and I’ve not taken the time to translate them. However, as I expect that this is mostly of interest to sweden based readers anyways I expect that this won’t be too much of a problem.
As Mikael remarks in his original post standard PCA is not suited to analysing compositional data (e.g. like in the voting data, where all the voting percentages need to add up to one for each municipality). This mean that I wanted to look into using robust principal component analysis for compositional data (available from the “robComposions” r-package) to see how that would influence the results.
Lets begin by having a look at the scoring plots overlaid with some information about the municipalities.
The two figures above shows two similar views of the Swedish political landscape. There is a clear trend from from rural to urban, with the left-right political spectrum following along approximately the same lines.
A great deal of the variability in the data (77%) is explained in these two principal components. So let’s look at their compositions:
Looking at the loading of the first principal component we see that we should expect to see municipalities with a relatively higher proportion of the traditionally rural Center party (C) votes on the left. On this side we also expect to see municipalities with a high percentage of votes for the right-wing populist/xenophobic party, the Sweden democrats (SD). While on the right we see municipalities with a high percentage of votes for the Liberals (FP) and the Green Party (MP). My interpretation is that this lends strength to the hypothesis that the x-axis shows a separation from rural to urban.
The second principal component interestingly seems to show a separation along the lines traditional Swedish political lines, with the right being on the bottom half of the figure and the left being in the top part.
To get a feel for this, let’s add the names of the municipalities to the PCA.
While this figure is something of a mess we can clearly see the names of the outliers. And this does seem to make sense. Gällivare and Jokkmokk in the north of Sweden traditionally having a strong leaning towards the left, while Danderyd outside of Stockholm is well known as a stronghold of the Swedish right. On the left we see rural municipalities like Ragunda and Strömsund and on the right we see the larger cities of Sweden, like Stockholm and Göteborg.
All-in-all it seems that my results here concur with Mikaels.
Mikael then goes on to do random forest classification to see if certain keep parameters about the different municipalities can predict the voting outcome. I decided to take more straightforward route here. I think that this is especially interesting due to the statements made in the blog Cornucopia(in Swedish) about the correlation between the percentage of votes for SD and the number of asylum seekers in the municipality.
Lars Wideräng claims that there is a strong positive correlation between the number of asylum seekers per capita in a municipality and the SD voting percentage, and a negative correlation with the number of M votes. He states as one possiblitiy that in municipalities with a high number of asylum seekers, voters flee from M over to SD because of this. However I think that that this is oversimplifying the situation.
One claim that he makes is that municipalities with right-wing governments accept fewer asylum seekers. Looking at this density plot, this appears to be correct.