I remember summarizing texts in school, and that was not always an easy task. Writing useful summaries - mostly based on lengthy texts - was almost an art. Especially meaningful summary writing, which entails not just copying bits of the original text, using the same words and sentence structures, but writing meaningful text using your own words, was the skill, which gave you the best marks.

Yet these days you can get a computer that can summarize text for you. The technical term for this is Automatic summarization. This is how Wikipedia defines it:

"Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content."

So I started looking for such an NLP model that would support Automatic summarization and found Pegasus, an NLP deep learning model that supports text summarization. The Pegasus paper came out on December 18, 2019, and is a result of the co-operation of the Data Science Institute, Imperial College London and Google UK Brain Team and Google Research.

Abstractive Summarization

The Pegasus paper focuses on "abstractive summarization" which may create new words during the summarization process. This seems to be the goal set by the Pegasus paper:

"In contrast to extractive summarization which merely copies informative fragments from the input, abstractive summarization may generate novel words. A good abstractive summary covers principal information in the input and is linguistically fluent."

Actually "abstractive summarization" was exactly what was considered to be a good summarization practice in school. So I decided to try it out.

Hugging Face Transformers

My favourite NLP library, Hugging Face Transformers, in version 3.1.0, 3.2.0 and now 3.3.0 come with a pre-trained Pegasus model. So I decided to try it out to see how it performs on some random Wikipedia sentences (Wikipedia has a REST API that delivers random English sentences here).

My questions about this state of the art model were: 

  • Does it perform well in terms of reducing some random text?
  • Does it produce correct output in terms of spelling?
  • Does the content make sense for a human being (me)?

My experiments with the model are all in a Google Colab Jupyter notebook. Also, note that I did not use the largest of the Pegasus models. This is the model I have used:

Does it perform well in terms of reducing some random text?

I started by summarizing some samples which I have found in the Pegasus paper. Here is one of them:

"She will play Denker, a lady's maid to Dame Maggie Smith's character, the Dowager Countess of Grantham. Johnston, who has also appeared in Waking the Dead and Coronation Street, joins new stars Richard E Grant and Anna Chancellor, both of whom will play guests of the Granthams at Downton. The hit period drama will return to screens this autumn. Series four of the show, which followed the wealthy Grantham family and their servants, achieved an average of 11.9 million viewers in the UK. The very British drama has also been a huge hit in the US, winning both Emmy Awards and Golden Globes. More than 26 million viewers watched series four on Masterpiece on PBS, making it one of the highest rating shows on American television. Previous high profile guest stars include Shirley Maclaine who played Martha Levinson, Lady Grantham's mother, and Oscar-nominated actor Paul Giamatti who appeared in last year's Christmas special as her" maverick, playboy" son. Series five will also feature 24 star Rade Sherbedgia as a Russian refugee who has fled the revolution after World War 1. Earlier this year, executive producer Gareth Neame promised it would have" all the usual highs and lows, romance, drama and comedy."

The model was summarized like this:

"Former EastEnders actress Janet Johnston has been cast as a guest star in the fifth series of Downton Abbey."

It really looks impressive in terms of reducing text, but there is something wrong here: it turns out that the model "invented" the name Janet Johnston. There was an Australian actress with that name, but she died in 1983, and the BBC  Soap Opera started in 1985.

So whilst the text is syntactically correct and superficially looks ok, it is giving you wrong information.

I have generated more examples that we will mention later in the blog.

The mean summary ratio on 200 samples from Wikipedia is around 0.41 (41%). This means that the summary text - produced by the Pegasus model in the variant I used - is 40% shorter than the original.

Here is a plot showing the distribution of the lengths of the summarized text both in the original and in the summarized version:

So you can see that the Pegasus can indeed reduce the length of the text.

Does it produce correct output in terms of spelling?

In our Google Colab Jupyter notebook, there is a check for the amount of spelling mistakes. I have counted the amount of spelling mistakes in the original and summarized text and the ratio of misspelt words is roughly the same:

So the output of the model is roughly as often misspelt as the original text. This means in my mini test that the Pegasus summaries and not worse in terms of spelling compared to the original text.

Does the content make sense for a human being (me)?

It is almost always readable - in the sense that you have full English sentences - but the problem here is that the summaries come with false information, like dead actresses performing in soap operas after their death.

Let us look at some more summary examples that look correct, but deliver false or slightly dodgy information (the wrong information is in bold text, whereas the summary is in normal font):

Original
Summary
Gunnison is a city in the Sevier Valley in southwestern in Sanpete County, Utah, United States. The population was 3,285 at the 2010 census. The city was named in honour of John W. Gunnison, a United States Army officer who surveyed for the transcontinental railroad in 1853.
Sanpete is a city in Utah, United States.

(Sanpete is the county, not the city) 
The Tupi or Tupian language family comprises some 70 languages spoken in South America, of which the best known are Tupi proper and Guarani.
Tupi is one of the most widely spoken languages in South America.

(Tupi is not in the top 10 of the most widely spoken languages in Latin America)
HMS Australia was one of seven Orlando-class armoured cruisers built for the Royal Navy in the mid-1880s. She was assigned to the Mediterranean Fleet in 1889 and remained there until 1893 when she returned home. The ship was assigned to the Coast Guard Squadron for the next decade before she was placed in reserve in 1903. Australia was sold for scrap in 1905.
A former Royal Navy cruiser has gone on display at the National Maritime Museum in Portsmouth.

(The cruiser HMS Australia was sold for scrap in 1905 and did not land in a museum)
In law, the bar is the legal profession as an institution. The term is a metonym for the line that separates the parts of a courtroom reserved for spectators and those reserved for participants in a trial, such as lawyers.
The bar is the name given to the legal profession in England and Wales.

(England and Wales? Where did that come from?)
Gregory Edward Smith is a Canadian actor, writer, and director. Smith has appeared in several Hollywood films, and is known for his roles as Alan Abernathy in Small Soldiers, Ephram Brown on the television series Everwood, and more recently as Dov Epstein on the police drama series Rookie Blue.
In our series of letters from African-Canadian journalists, film-maker and columnist Don Riddell looks at the life and work of actor and writer Gregory Smith.

(There is no mention of letters from African-Canadian journalists)

In most cases, you get some decent summaries, but also dodgy ones with invented facts. And sometimes you also get the feeling that the summary just failed and then you get a placeholder type of summary, like:

All images are copyrighted.

Conclusion

The promise of abstractive automated text summarization is fascinating and is partly fulfilled by the Pegasus model. Yet I have the feeling that the version of the model, which I have tested is not yet there in terms of delivering results, that are always meaningful and correct. Whilst the text is always readable; sometimes it comes with wrong or inaccurate information; in other occasions, the text does not make much sense. 

I invite the readers to check some of the summary samples that I generated in the Google Colab Jupyter notebook to get an idea about the capabilities of this model.