The Newspaper you read, your political seed

Introduction

Abstract

Donald Trump banned from twitter; this sentence should be familiar to you. In the last ten years, there has been an explosion of polemical sentences of all kinds. Most of the well-known newspapers have picked up these quotations and put them in their columns, and not only from Twitter… Thanks to the framework Quobert, developed by Robert West and others, we have a dataset of millions of quotations on hand coming from different newspapers between 2015 to 2020. The million-dollar question was the next one, what can we proceed with such a dataset?

To point out some interesting facts about these quotations, we have decided to focus on three main reading axes. Before doing this, the key step is to read the entire dataset . Once this has been done, we can then much easily play with our data ! Then, we decided to make comparisons between newspapers affiliated with Democrats or Republicans , and to see if there is an affiliation between the quotations reported in the newspapers and their political positioning. Since there are plenty of different newspaper, we decided to focus initially on only two newspapers , whose political views are well known. For doing comparisons between two newspapers, the objective is to analyze which parameters allow to make a difference between them.

The three parameters we picked out are topic detection, speakers, and sentiment analysis . Once we analyze the quotations of two newspapers with those guidelines, the next step is to apply the parameters which give a signifcant results to other journals. That in order to produce a clear framework that would allow to compare newspapers and state their political affiliation.

Methods

Choice of the two reference newspapers

First, the choice of the two newspapers we work with is crucial. Some journals have “centered” opinions or shaded positioning. That is why, the focus is made on polarized newspapers , which makes it easier to study and define the parameters. For our study, two newspaper have been chosen: Foxnews and New York Times (NYT). Both are polarized, Foxnews is in favor of more conservative political positions and is mainly viewed by Republican partisans while New York Times is more left-leaning and followed mainly by Democrats (figure below, Statista, consulted the 15.11.2021). Note that the data showed that the New York Times had more quotations available in the Quotebank dataset than Fox news. The number of quotations found for the New York Times was of 894,838 quotes compared to 708,383 for Fox News.

Infographic: Party Affiliation Defines News Sources | Statista

Topic Detection

Number of topics for each journal

Latent Dirichlet Allocation (LDA) is an unsupervised method which allow to create magically topics composed of specific words. One has to specify the number of topics. To do so, the ideal number of topics for the newspaper has to be specified. We calculated the coherence score for different number of topics (from 2 to 10), the plots show the results. Then we have taken the number which corresponds to the highest score and we have plot the topics using PyLDAvis.

Determination of the number of topics for New York Times

Determination of the number of topics for Fox News

Some observations can be made :

Visualisation of the number of topics

Topic Detection for the New York Times

Topic Detection for the Fox News

From now on, as shown in the two graphs, we will focus exclusively on 3 themes for New York Times and 9 for Fox News and proceed to an LDA.

Comparison between topics for the New York Times and Fox News

Looking at the Topic Detection by LDA for both newspapers, the following can be observed:

It’s now your turn to play with the data with a click on the circle you want to make appear! With only a small coefficient of coherence, this result should be taken with caution (the C_V is only around 0.2 and 0.3).

What can be said about the numbers behind the quotations ?

Quotes counting

We all know that there are hot topics where republicans and democrats diverge. We selected some of them and we have associated words to each of them. The words were mapped to the same normalized form, by stripping affixes (words stemming) whenever possible. This allows to cover a higher range of quotations that could be related to the chosen topics of immigration, terrorism, climate change, abortion, religion, racism . We first calculated how many quotations are about these specific topics. The results are illustrated in bar chart below. Fox News has in general more quotations about the selected topic. This behavior is observed for all topics, except for climate change where their normalized occurrences are similar (as shown in the plot). The largest difference is for abortion, where Fox News has much more quotations. Religion is also more occurring for the latter and could make a clear link with a Republican affiliation.

Percentage of quotations by topic and newspapers

Additionally, the t-test for each subject only gives a p-value higher than the significance level of 0.05 for climate change and thus shows that it is statistically equally cited by both newspapers.

p-value
Immigration 0.000000e+00
Terrorism 0.000000e+00
Climate change 2.331170e-01
Abortion 4.609667e-67
Religion 0.000000e+00
Racism 8.309947e-51
Key parameters p-value for both newspapers

It seems here once again, that Fox News prefers to talk more about those topics than the New York Times, which consolidates the hypothesis elaborated in the topic detection section of this data story. Here, this could be maybe explained by the fact that the New York Times talks about a broader number of topics .

Year analysis

Then, we took another direction, and we plotted the topic related quotations for each journal by year. The results are shown in the plot below.
As before, one can see that except for climate change , Fox News quotes more the chosen topics. Furthermore, even if the peaks for Fox News are higher than the ones from New York Times, it can be noticed that the peaks coincide. This means that probably an important event happened at that time and both newspapers wrote about it.
The increase in Fox News quotations about immigration seem to happen during the period where Donald Trump was president and decided to construct the wall between the US and Mexico, with a decrease in 2019 when Biden took the presidency. The peak of quotes for abortion in 2019 is maybe due to the House Bill 314 and the Alabama abortion ban that happened on May 15, 2019, which imposed a near-total ban on abortion in the state starting in November 2019. Moreover, the peaks are usually much higher for Fox News compared to NYT, for example when looking at immigration or abortion graphs. Another evidence shown by these plots is that the differences in quotations talking about terrorism and religion between both media are one of the greatest and are almost constant.

Key words over years for the New York Times and Fox News

As a second part of this analysis, we focused on the speakers:

Most cited politicians

This plot shows the most cited politicians for both newspapers and shows that Donald Trump is by far the most quoted one, since his histogram bean is still higher than the others, even with a logarithmic scale.

Republicans or Democrats speakers ?

Knowing the political affiliation of each, we tried to see if the New York Times would cite in the top 10 more Democrats while Fox News more Republicans. But in reality, NYT have more Republicans in its top 10 speakers while Fox News has more Democrats. This can be showed with the plot below :

Political party's distribution

Similarly, to the previous part, an analysis by year for some key politicians for each journal has been made. The increase in quotations about Biden are easy to explain since he became a candidate for the elections. The number of quotes seem to be very much correlated to punctual events during the years. Fox News seems also to be more extreme and to always cite more the politicians in comparison to New York Times.

Speakers over the years

In the following part, a further depth into the data is carried out to understand the sentiment behind the respective quotations by each newspaper.

Sentiment analysis

The sentiment analysis between the two journals has been done thanks to two libraries: NLTK and text2emotion. For the first one the function “SentimentIntensityAnalyzer” has been used, it helps determine if the sentiments of a text are positive or negative. On the other hand, text2emotion captures the intensity of these 5 emotions: fear, happiness, anger, surprise and sadness.
The dataset has been analysed in two forms: all the quotes by the NY times and Foxnews year by year, and the quotes from these two journals on certain subjects.

Year by year analysis:

This analysis showed some consistency as each year (except for 2015), the quotes from Foxnews were more negative than the one from the NYtimes (for instance, in 2016, we got a p-value of 1.753e-125 ) as showed by the bar plot below.

Comparison of negative average sentiments score between Foxnews and the NY times year by year with NLTK

For the emotions, text2emotion function found that the quotations from Foxnews had more surprise and sadness than those from the New York Times (the two bar plots below illustrate this tendence). The fear emotion is about equal for all years between the two journals except for 2015 and 2020 where the quotes from Foxnews have more of this emotion.

Comparison of the surprise and emotion average score year by year between Foxnews and the NY times with text2emotion

Comparison of the average sadness score year by year between Foxnews and the NY times with text2emotion

By subject analysis:

As said before, six subjects were analysed. The ones that showed the most differences between the two newspapers are immigration, terrorism, and racism, looking at positive and negative sentiment values from NLTK. The key parameters surprise, fear and sadness had also low p-values when comparing the distribution of the values for the two journals (for instance, for the surprise emotion on the racism subject, we got a p-value of 1.712e-14).
For the subject climate change, Foxnews was a lot more negative (p-value of 1.813e-08) than the New York Times. For abortion and religion no big differences were shown. What can be noted is the impact of the subject on the values of the emotions.

Comparison of the surprise average score by subject between Foxnews and the NY times with text2emotion

Comparison of positive and negative sentiments about climate change between Foxnews and the NY times with NLTK

Sentiment analysis about the covid

Quotes mentioning covid are much more cited by Fox News compared to NYT. Indeed, between 2019 and 2020, approximately 5.3% of quotations were about corona while for NYT it was only of 2.2%. When digging a bit deeper, we can see that covid related quotations for both medias are expressing a lot of fear, followed by sadness, a bit of anger and some happiness for both journals which is at first glance weird, but don’t forget that some have found teleworking appealing and grounding. The only significant difference in terms of emotion when looking at the p-value is for surprise, where Fox News has a higher score.

Sentiment analysis related to Covid

p-value
Fear 0.496
Happy 0.105
Angry 0.277
Surprise 0
Sad 0.231
Covid emotions p-value for both newspapers

Washington Post

The last step of the sentiment analysis is to try to replicate the findings on another journal, here the Washington Post we know that it’s left oriented (Washington Post ). Of all the differences found before, two of the tests on this third media company are presented here: the NLTK negative sentiment year by year calculation, and the text2emotion by subject. The first one (first plot below) seems to be inconclusive: it was expected that the Washington Post would be closer to the NY times than Fox News, but that is not the case here. This could be explained by the fact that each newspaper has its own way of choosing their quotations and therefore each newspaper is unique and not comparable.
On the other hand, the second test (second plot below) seems to be more robust and closer to what we expected. In fact, the immigration, terrorism and racism subjects from the Washington Post are closer to the New York Times than for Fox News. Note that in the first part of the sentiment analysis, these subjects had also the biggest differences between the two initial journals.

Comparison year by of positive and negative average score sentiments between Foxnews, the NY times and the Washington Post with NLTK

Comparison by subjects of the surprise average score sentiments between Foxnews, the NY times and the Washington Post with NLTK

Conclusion

We have taken two sided newspapers: Fox News (right sided) and New York Times (left sided). Through the study of their quotations we can clearly see a difference for some hot topics, especially for immigration, terrorism and racism. Fox News seems to point them out and to have faster increasing or decreasing trends from one year to another.
Washington Post also appears to publish quotations about more various topics in comparison to FoxNews where it is easier to define specific topics and the words in the topics detection are more easily related to Republican ones.
For the sentiments, no clear difference is noticed apart for the fact that Fox News use more negative emotions in opposition to New York Times, but this is not shared by the Washington Post for example.
Therefore, certain quotations could reflect the political sided bias of newspapers but not completely. Indeed, one have to be careful as all this study has been carried on quotations and not on journalists writings. To analyze further the effects of quotations, many other subjects could be studied, speakers affiliations could be further analyzed, other journals could be chosen with for example less polarized opinions.


Mettler Marc, Goulart Maia Manuela, M’Saada Sinda, Charroin François
ADA, EPFL, December 2021
GitHub Repository Website Repository
Theme duo by Shu Uesugi Title image duo
Title image : Pew Research Center, consulted the 17.12.2021