<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>text summmarization Archives - Turbolab Technologies</title>
	<atom:link href="https://turbolab.in/tag/text-summmarization/feed/" rel="self" type="application/rss+xml" />
	<link>https://turbolab.in/tag/text-summmarization/</link>
	<description>Big Data and News Analysis Startup in Kochi</description>
	<lastBuildDate>Fri, 03 Dec 2021 05:40:41 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://i0.wp.com/turbolab.in/wp-content/uploads/2018/03/turbo_black_trans-space.png?fit=32%2C32&#038;ssl=1</url>
	<title>text summmarization Archives - Turbolab Technologies</title>
	<link>https://turbolab.in/tag/text-summmarization/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">98237731</site>	<item>
		<title>Abstractive Summarization Using Google&#8217;s T5</title>
		<link>https://turbolab.in/abstractive-summarization-using-googles-t5/</link>
					<comments>https://turbolab.in/abstractive-summarization-using-googles-t5/#respond</comments>
		
		<dc:creator><![CDATA[Vasista Reddy]]></dc:creator>
		<pubDate>Mon, 04 Oct 2021 04:04:00 +0000</pubDate>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[abstractive summarization]]></category>
		<category><![CDATA[bert]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[text summmarization]]></category>
		<guid isPermaLink="false">https://turbolab.in/?p=592</guid>

					<description><![CDATA[<p>In this article, we will discuss abstractive summarization using T5, and how it is different from BERT-based models. T5 (Text-To-Text Transfer Transformer) is a transformer model that is trained in an end-to-end manner with text as input and modified text as output, in contrast to BERT-style models that can only output either a class label [&#8230;]</p>
<p>The post <a href="https://turbolab.in/abstractive-summarization-using-googles-t5/">Abstractive Summarization Using Google&#8217;s T5</a> appeared first on <a href="https://turbolab.in">Turbolab Technologies</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400">In this article, we will discuss abstractive summarization using T5, and how it is different from BERT-based models.</span></p>
<p><span style="font-weight: 400">T5 (Text-To-Text Transfer Transformer) is a transformer model that is trained in an end-to-end manner with </span><b>text as input</b><span style="font-weight: 400"> and modified </span><b>text as output</b><span style="font-weight: 400">,</span><span style="font-weight: 400"> in contrast to BERT-style models that can only output either a class label or a span of the input. </span><span style="font-weight: 400">This text-to-text formatting makes the T5 model fit for multiple <strong>NLP</strong> tasks like <strong>Summarization</strong>, <strong>Question-Answering</strong>, <strong>Machine Translation</strong>, and <strong>Classification</strong> problems.</span></p>
<h2><span style="font-weight: 400">How T5 is different from BERT?</span></h2>
<p><span style="font-weight: 400">Both T5 and BERT are trained with MLM (Masked Language Model) approach. </span></p>
<p><strong>What is MLM? </strong></p>
<p><span style="font-weight: 400">The MLM is a fill-in-the-blank task, where the model masks part of the input text and tries to predict what that masked word should be.</span></p>
<p><span style="font-weight: 400">Example:</span></p>
<ul>
<li><b><i>“I like to eat peanut butter and &lt;MASK&gt; sandwiches,”</i></b></li>
</ul>
<ul>
<li><b><i>“I like to eat peanut butter and </i></b><b>jelly</b><b><i> sandwiches,”</i></b></li>
</ul>
<p><span style="font-weight: 400">The only difference is that T5 replaces multiple consecutive tokens with the single Mask Keyword, unlike, BERT which uses Mask token for each word. This illustration is shown below.</span></p>
<figure id="attachment_593" aria-describedby="caption-attachment-593" style="width: 591px" class="wp-caption alignnone"><img data-recalc-dims="1" fetchpriority="high" decoding="async" data-attachment-id="593" data-permalink="https://turbolab.in/abstractive-summarization-using-googles-t5/mlm/" data-orig-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/mlm.png?fit=591%2C266&amp;ssl=1" data-orig-size="591,266" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="mlm" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/mlm.png?fit=591%2C266&amp;ssl=1" class="wp-image-593 size-full" src="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/mlm.png?resize=591%2C266&#038;ssl=1" alt="" width="591" height="266" srcset="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/mlm.png?w=591&amp;ssl=1 591w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/mlm.png?resize=300%2C135&amp;ssl=1 300w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/mlm.png?resize=480%2C216&amp;ssl=1 480w" sizes="(max-width: 591px) 100vw, 591px" /><figcaption id="caption-attachment-593" class="wp-caption-text">Source: Journal of Machine Learning</figcaption></figure>
<h2><span style="font-weight: 400">About T5 Models</span></h2>
<p><span style="font-weight: 400">Google has released the pre-trained T5 text-to-text framework models which are trained on the unlabelled large text corpus called C4 (Colossal Clean Crawled Corpus) using deep learning. C4 is the web extract text of 800Gb cleaned data. The cleaning process involves deduplication, discarding incomplete sentences, and removing offensive or noisy content.</span></p>
<p><b>You can get these T5 pre-trained models from the </b><a href="https://huggingface.co/models?search=T5"><b>HuggingFace website</b></a><b>:</b></p>
<ol>
<li><span style="font-weight: 400">   T5-small with 60 million parameters.</span></li>
<li><span style="font-weight: 400">   T5-base with 220 million parameters.</span></li>
<li><span style="font-weight: 400">   T5-large with 770 million parameters.</span></li>
<li><span style="font-weight: 400">   T5-3B with 3 billion parameters.</span></li>
<li><span style="font-weight: 400">   T5-11B with 11 billion parameters.</span></li>
</ol>
<p><img data-recalc-dims="1" decoding="async" data-attachment-id="594" data-permalink="https://turbolab.in/abstractive-summarization-using-googles-t5/capture/" data-orig-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?fit=1318%2C491&amp;ssl=1" data-orig-size="1318,491" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;vasista reddy&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;1632147683&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="t5 models" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?fit=800%2C298&amp;ssl=1" class="alignnone size-full wp-image-594" src="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=800%2C298&#038;ssl=1" alt="" width="800" height="298" srcset="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?w=1318&amp;ssl=1 1318w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=300%2C112&amp;ssl=1 300w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=768%2C286&amp;ssl=1 768w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=1024%2C381&amp;ssl=1 1024w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=1080%2C402&amp;ssl=1 1080w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=1280%2C477&amp;ssl=1 1280w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=980%2C365&amp;ssl=1 980w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Capture.jpg?resize=480%2C179&amp;ssl=1 480w" sizes="(max-width: 800px) 100vw, 800px" /></p>
<p><span style="font-weight: 400">T5 expects a prefix before the input text to understand the task given by the user. For example, &#8220;<strong>summarize</strong>:&#8221; for the summarization, &#8220;<strong>cola sentence:</strong>&#8221; for the classification, &#8220;<strong>translate</strong> English to Spanish:&#8221; for the machine translation, etc., You can have a look at the below image to understand the above illustration.</span></p>
<figure id="attachment_595" aria-describedby="caption-attachment-595" style="width: 744px" class="wp-caption alignnone"><img data-recalc-dims="1" decoding="async" data-attachment-id="595" data-permalink="https://turbolab.in/abstractive-summarization-using-googles-t5/1_xch7mi0d_v3vvdipu-svkq-744x328/" data-orig-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/1_xCh7mi0D_V3vvdIpU-sVKQ-744x328.png?fit=744%2C328&amp;ssl=1" data-orig-size="744,328" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="1_xCh7mi0D_V3vvdIpU-sVKQ-744&amp;#215;328" data-image-description="" data-image-caption="&lt;p&gt;Source: Google AI Blog&lt;/p&gt;
" data-large-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/1_xCh7mi0D_V3vvdIpU-sVKQ-744x328.png?fit=744%2C328&amp;ssl=1" class="size-full wp-image-595" src="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/1_xCh7mi0D_V3vvdIpU-sVKQ-744x328.png?resize=744%2C328&#038;ssl=1" alt="" width="744" height="328" srcset="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/1_xCh7mi0D_V3vvdIpU-sVKQ-744x328.png?resize=744%2C328&amp;ssl=1 744w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/1_xCh7mi0D_V3vvdIpU-sVKQ-744x328.png?resize=300%2C132&amp;ssl=1 300w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/1_xCh7mi0D_V3vvdIpU-sVKQ-744x328.png?resize=480%2C212&amp;ssl=1 480w" sizes="(max-width: 744px) 100vw, 744px" /><figcaption id="caption-attachment-595" class="wp-caption-text">Source: Google AI Blog</figcaption></figure>
<p>Every task we consider uses text as input to the model, which is trained to generate some target text. This allows us to use the same model, loss function, and hyperparameters across our diverse set of tasks including translation (green), linguistic acceptability (red), sentence similarity (yellow), and document summarization (blue).</p>
<p><span style="font-weight: 400">Besides the improved transformer architecture and massive unsupervised training data, better decoding methods have also played an important role. Currently, the most prominent decoding methods are </span><b>Greedy Search</b><span style="font-weight: 400">, </span><b>Beam Search</b><span style="font-weight: 400">, </span><b>Top-K Sampling,</b><span style="font-weight: 400"> and </span><b>Top-p Sampling</b><span style="font-weight: 400">. </span></p>
<p><span style="font-weight: 400">Visit this </span><a href="https://huggingface.co/blog/how-to-generate">link</a><span style="font-weight: 400"> to know the detailed information about these methods.</span></p>
<h2><span style="font-weight: 400">Using T5 through the HuggingFace transformers:</span></h2>
<p><span style="font-weight: 400">HuggingFace, an open-source NLP library that helps load pre-trained models, which are similar to sci-kit learn for machine learning algorithms.</span></p>
<p><span style="font-weight: 400">We define the content we are going to summarize. </span></p>
<blockquote><p><em><span style="font-weight: 400">content = </span><span style="font-weight: 400">&#8220;China’s Huawei overtook Samsung Electronics as the world’s biggest seller of mobile phones in the second quarter of 2020, shipping 55.8 million devices compared to Samsung’s 53.7 million, according to data from research firm Canalys. While Huawei’s sales fell 5 per cent from the same quarter a year earlier, South Korea’s Samsung posted a bigger drop of 30 per cent, owing to disruption from the coronavirus in key markets such as Brazil, the United States and Europe, Canalys said. Huawei’s overseas shipments fell 27 per cent in Q2 from a year earlier, but the company increased its dominance of the China market which has been faster to recover from COVID-19 and where it now sells over 70 per cent of its phones. “Our business has demonstrated exceptional resilience in these difficult times,” a Huawei spokesman said. “Amidst a period of unprecedented global economic slowdown and challenges, we’re continued to grow and further our leadership position.” Nevertheless, Huawei’s position as number one seller may prove short-lived once other markets recover given it is mainly due to economic disruption, a senior Huawei employee with knowledge of the matter told Reuters. Apple is due to release its Q2 iPhone shipment data on Friday.&#8221;</span></em></p></blockquote>
<h3>Importing the necessary packages</h3>
<blockquote><p>from transformers import T5Tokenizer, T5ForConditionalGeneration</p></blockquote>
<h3>Loading the tokenizer and model architecture with weights</h3>
<blockquote><p>T5_PATH = &#8216;t5-large&#8217; # T5 model name</p>
<p># initialize the model architecture and weights</p>
<p>t5_model = T5ForConditionalGeneration.from_pretrained(T5_PATH)</p>
<p># initialize the model tokenizer</p>
<p>t5_tokenizer = T5Tokenizer.from_pretrained(T5_PATH)</p></blockquote>
<p><span style="font-weight: 400">The pre-trained model used here is t5-large. Other pre-trained models of t5 are discussed above.</span></p>
<h3>Encode the text</h3>
<blockquote><p><span style="font-weight: 400"># encode the text into tensor of integers using the tokenizer</span></p>
<p><em><span style="font-weight: 400">inputs = tokenizer.encode(&#8220;summarize: &#8221; + article, return_tensors=&#8221;pt&#8221;, max_length=512, padding=’max_length’, truncation=True)</span></em></p></blockquote>
<h3>Generate the summarized text and decode it</h3>
<blockquote><p><em><span style="font-weight: 400">summary_ids = t5_model.generate(inputs,</span></em></p>
<p><em><span style="font-weight: 400">                                    num_beams=int(2),</span></em></p>
<p><em><span style="font-weight: 400">                                    no_repeat_ngram_size=3,</span></em></p>
<p><em><span style="font-weight: 400">                                    length_penalty=2.0,</span></em></p>
<p><em><span style="font-weight: 400">                                    min_length=min_length,</span></em></p>
<p><em><span style="font-weight: 400">                                    max_length=max_length,</span></em></p>
<p><em><span style="font-weight: 400">                                    early_stopping=True)</span></em></p>
<p><em><span style="font-weight: 400">output = t5_tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)</span></em></p></blockquote>
<p><span style="font-weight: 400">The decoding method used here is </span><b>Beam Search</b><span style="font-weight: 400"> with </span><b>num_beams </b><span style="font-weight: 400">value as 2.</span></p>
<p><span style="font-weight: 400">With </span><b>min_length 50 </b><span style="font-weight: 400">and </span><b>max_length 50</b><span style="font-weight: 400">, the output is:</span></p>
<blockquote><p><i><span style="font-weight: 400">&#8220;Huawei overtakes Samsung as world&#8217;s biggest seller of mobile phones in second quarter of 2020. Company shipped 55.8 million devices compared to Samsung&#8217;s 53.7 million, Canalys says. Sales of Huawei&#8217;s&#8221;</span></i></p></blockquote>
<p><span style="font-weight: 400">and the time taken to generate the summary is 8.07 seconds with 16 cores CPU host.</span></p>
<p><span style="font-weight: 400">With </span><b>min_length 50 </b><span style="font-weight: 400">and </span><b>max_length 100</b><span style="font-weight: 400">, the output is:</span></p>
<blockquote><p><i><span style="font-weight: 400">&#8220;Huawei overtakes Samsung as world&#8217;s biggest seller of mobile phones in second quarter of 2020. Company shipped 55.8 million devices compared to Samsung&#8217;s 53.7 million, Canalys says. Sales fell 5% from same quarter a year earlier, owing to disruption from coronavirus. But company increased its dominance of the china market which has been faster to recover from COVID-19.&#8221;</span></i></p></blockquote>
<p><span style="font-weight: 400">and the time taken to generate the summary is 14.32 seconds with 16 cores CPU host.</span></p>
<p><span style="font-weight: 400">With </span><b>min_length 100 </b><span style="font-weight: 400">and </span><b>max_length 200</b><span style="font-weight: 400">, the output is:</span></p>
<blockquote><p><i><span style="font-weight: 400">&#8220;Huawei overtakes Samsung as world&#8217;s biggest seller of mobile phones in second quarter of 2020. Company shipped 55.8 million devices compared to Samsung&#8217;s 53.7 million, Canalys says. Sales fell 5% from same quarter a year earlier, owing to disruption from coronavirus. But Huawei increased its dominance of the china market which has been faster to recover from COVID-19.. Apple is due to release its Q2 iPhone shipment data on friday.&#8221;</span></i></p></blockquote>
<p><span style="font-weight: 400">and the time taken to generate the summary is 23.15 seconds with 16 cores CPU host.</span></p>
<p><span style="font-weight: 400">As you increase any of these parameters </span><b>num_beams</b><span style="font-weight: 400">, </span><b>min_lenth, </b><span style="font-weight: 400">and </span><b>max_length</b><span style="font-weight: 400">, the time taken to generate the summary is going to increase.</span></p>
<h2><span style="font-weight: 400">Conclusion</span></h2>
<p><span style="font-weight: 400">In this article, we have used the Beam Search decoding method. For a better summary, we can suggest increasing the beam value and trying the other decoding methods(<b>Greedy Search</b>, <b>Beam Search</b>, <b>Top-K Sampling,</b> and <b>Top-p Sampling)</b> mentioned. </span></p>
<p><span style="font-weight: 400">With Pegasus, we can only perform abstractive summarization but T5 can perform various NLP tasks like Classification tasks (eg: Sentiment Analysis), Question-Answering, Machine Translation, and Document Summarization. We recommend you go through the other NLP tasks of T5.</span></p>
<p>The post <a href="https://turbolab.in/abstractive-summarization-using-googles-t5/">Abstractive Summarization Using Google&#8217;s T5</a> appeared first on <a href="https://turbolab.in">Turbolab Technologies</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://turbolab.in/abstractive-summarization-using-googles-t5/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">592</post-id>	</item>
		<item>
		<title>Types of Text Summarization: Extractive and Abstractive Summarization Basics</title>
		<link>https://turbolab.in/types-of-text-summarization-extractive-and-abstractive-summarization-basics/</link>
					<comments>https://turbolab.in/types-of-text-summarization-extractive-and-abstractive-summarization-basics/#respond</comments>
		
		<dc:creator><![CDATA[Vasista Reddy]]></dc:creator>
		<pubDate>Mon, 20 Sep 2021 08:19:52 +0000</pubDate>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[text summmarization]]></category>
		<guid isPermaLink="false">https://turbolab.in/?p=582</guid>

					<description><![CDATA[<p>Summarization is one of the most common Natural Language Processing (NLP) tasks. With the amount of new content generated by billions of people and their smartphones everyday, we are inundated with increasing amount of data every day. Humans can only consume a finite amount of information and need a way to filter out the wheat [&#8230;]</p>
<p>The post <a href="https://turbolab.in/types-of-text-summarization-extractive-and-abstractive-summarization-basics/">Types of Text Summarization: Extractive and Abstractive Summarization Basics</a> appeared first on <a href="https://turbolab.in">Turbolab Technologies</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>Summarization is one of the most common <strong>Natural Language Processing</strong> (NLP) tasks. With the amount of new content generated by billions of people and their smartphones everyday, we are inundated with increasing amount of data every day. Humans can only consume a finite amount of information and need a way to filter out the wheat from the chaff and find the information that matters. Text summarization can help achieve that for textual information. We can separate the signal from the noise and take meaningful actions from them.</p>
<p>In this article, we explore different methods to implement this task and some of the learnings that we have come across on the way. We hope this will be helpful to other folks who would like to implement basic summarization in their data science pipeline for solving different business problems.</p>
<p>Python provides some excellent libraries and modules to perform Text Summarization. We will provide a simple example of generating Extractive Summarization using the Gensim and HuggingFace modules in this article.</p>
<p>&nbsp;</p>
<h2><strong>Uses of Summarization?</strong></h2>
<p>&nbsp;</p>
<p>It may be tempting to use summarization for all texts to get useful information from them and spend less time reading. However, for now, NLP summarization has been a successful use case in only a few areas.</p>
<p>Text summarization works great if a text has a lot of raw facts and can be used to filter important information from them. The NLP models can summarize long documents and represent them in small simpler sentences. News, factsheets, and mailers fall under these categories.</p>
<p>However, for texts where each sentence builds up upon the previous, text summarization does not work that well. Research journals, medical texts are good examples of texts where summarization might not be very successful.</p>
<p>Finally, if we take the case of summarizing fiction, summarization methods can work fine. However, it might miss the style and the tone of the text that the author tried to express.</p>
<p>Hence, Text summarization is helpful only in a handful of use cases.</p>
<p>&nbsp;</p>
<h2><strong>Two Types Of Summarization</strong></h2>
<p>&nbsp;</p>
<p>There are two main types of Text Summarization</p>
<p><img data-recalc-dims="1" loading="lazy" decoding="async" data-attachment-id="587" data-permalink="https://turbolab.in/types-of-text-summarization-extractive-and-abstractive-summarization-basics/extractive-vs-abstractive-summarization/" data-orig-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?fit=964%2C693&amp;ssl=1" data-orig-size="964,693" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Extractive-vs-Abstractive-Summarization" data-image-description="" data-image-caption="" data-large-file="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?fit=800%2C575&amp;ssl=1" class="alignnone size-full wp-image-587" src="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?resize=800%2C575&#038;ssl=1" alt="" width="800" height="575" srcset="https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?w=964&amp;ssl=1 964w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?resize=300%2C216&amp;ssl=1 300w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?resize=768%2C552&amp;ssl=1 768w, https://i0.wp.com/turbolab.in/wp-content/uploads/2021/09/Extractive-vs-Abstractive-Summarization.jpg?resize=480%2C345&amp;ssl=1 480w" sizes="(max-width: 800px) 100vw, 800px" /></p>
<h3><strong>Extractive</strong></h3>
<p>&nbsp;</p>
<p>Extractive summarization methods work just like that. It takes the text, ranks all the sentences according to the understanding and relevance of the text, and presents you with the most important sentences.</p>
<p>This method does not create new words or phrases, it just takes the already existing words and phrases and presents only that. You can imagine this as taking a page of text and marking the most important sentences using a highlighter.</p>
<p>&nbsp;</p>
<h3><strong>Abstractive</strong></h3>
<p>&nbsp;</p>
<p>Abstractive summarization, on the other hand, tries to guess the meaning of the whole text and presents the meaning to you.</p>
<p>It creates words and phrases, puts them together in a meaningful way, and along with that, adds the most important facts found in the text. This way, abstractive summarization techniques are more complex than extractive summarization techniques and are also computationally more expensive.</p>
<p>&nbsp;</p>
<h2><strong>Comparison with practical example</strong></h2>
<p>&nbsp;</p>
<p>The best way to illustrate these types is through an example. Here we have run the Input Text below through both types of summarization and the results are shown below.</p>
<h3><strong>Input Text:</strong></h3>
<blockquote><p><em>China&#8217;s Huawei overtook Samsung Electronics as the world&#8217;s biggest seller of mobile phones in the second quarter of 2020, shipping 55.8 million devices compared to Samsung&#8217;s 53.7 million, according to data from research firm Canalys. While Huawei&#8217;s sales fell 5 per cent from the same quarter a year earlier, South Korea&#8217;s Samsung posted a bigger drop of 30 per cent, owing to disruption from the coronavirus in key markets such as Brazil, the United States and Europe, Canalys said. Huawei&#8217;s overseas shipments fell 27 per cent in Q2 from a year earlier, but the company increased its dominance of the China market which has been faster to recover from COVID-19 and where it now sells over 70 per cent of its phones. &#8220;Our business has demonstrated exceptional resilience in these difficult times,&#8221; a Huawei spokesman said. &#8220;Amidst a period of unprecedented global economic slowdown and challenges, we&#8217;re continued to grow and further our leadership position.&#8221; Nevertheless, Huawei&#8217;s position as number one seller may prove short-lived once other markets recover given it is mainly due to economic disruption, a senior Huawei employee with knowledge of the matter told Reuters. Apple is due to release its Q2 iPhone shipment data on Friday.</em></p></blockquote>
<h3><strong>Extractive Summarization Output:</strong></h3>
<blockquote><p><em>While Huawei&#8217;s sales fell 5 per cent from the same quarter a year earlier, South Korea&#8217;s Samsung posted a bigger drop of 30 per cent, owing to disruption from the coronavirus in key markets such as Brazil, the United States and Europe, Canalys said. Huawei&#8217;s overseas shipments fell 27 per cent in Q2 from a year earlier, but the company increased its dominance of the China market which has been faster to recover from COVID-19 and where it now sells over 70 per cent of its phones.</em></p></blockquote>
<h3><strong>Abstractive Summarization Output:</strong></h3>
<blockquote><p><em>Huawei overtakes Samsung as world&#8217;s biggest seller of mobile phones in the second quarter of 2020. Sales of Huawei&#8217;s 55.8 million devices compared to 53.7 million for south Korea&#8217;s Samsung. Shipments overseas fell 27 per cent in Q2 from a year earlier, but company increased its dominance of the china market. Position as number one seller may prove short-lived once other markets recover, a senior Huawei employee says.</em></p></blockquote>
<p>&nbsp;</p>
<h2><strong>Extractive Text Summarization Using Gensim</strong></h2>
<p>&nbsp;</p>
<p>Import the required libraries and functions:</p>
<blockquote><p><strong><em>from gensim.summarization.summarizer import summarize</em></strong></p>
<p><strong><em>from gensim.summarization.textcleaner import split_sentences</em></strong></p></blockquote>
<p>We store the article content in a variable called Input (mentioned above). Next, we have to pass it to the summarize function, the second parameter being the ratio we want the summarized text to be. We chose it as 0.4, or the summary will be around 40% of the original text.</p>
<blockquote><p><em>summarize(Input, 0.4)</em></p></blockquote>
<h4><strong>Output:</strong></h4>
<blockquote><p><em>While Huawei&#8217;s sales fell 5 per cent from the same quarter a year earlier, South Korea&#8217;s Samsung posted a bigger drop of 30 per cent, owing to disruption from the coronavirus in key markets such as Brazil, the United States and Europe, Canalys said. Huawei&#8217;s overseas shipments fell 27 per cent in Q2 from a year earlier, but the company increased its dominance of the China market which has been faster to recover from COVID-19 and where it now sells over 70 per cent of its phones.</em></p></blockquote>
<p>With the parameter <strong>split=True</strong>, you can see the output as a list of sentences.</p>
<p>Gensim summarization works with the TextRank algorithm. As the name suggests, it ranks texts and gives you the most important ones back.</p>
<p>&nbsp;</p>
<h2><strong>Extractive Text Summarization Using Huggingface Transformers</strong></h2>
<p>&nbsp;</p>
<p>We use the same article to summarize as before, but this time, we use a transformer model from Huggingface,</p>
<blockquote><p><em>from transformers import pipeline</em></p></blockquote>
<p>We have to load the pre-trained summarization model into the pipeline:</p>
<blockquote><p><em>summarizer = pipeline(&#8220;summarization&#8221;)</em></p></blockquote>
<p>Next, to use this model, we pass the text, the minimum length, and the maximum length parameters. We get the following output:</p>
<blockquote><p><em>summarizer(Input, min_length=30, max_length=300)</em></p></blockquote>
<h4><strong>Output:</strong></h4>
<blockquote><p><em>China&#8217;s Huawei overtook Samsung Electronics as the world&#8217;s biggest seller of mobile phones in the second quarter of 2020, shipping 55.8 million devices compared to Samsung&#8217;s 53.7 million. Samsung posted a bigger drop of 30 per cent, owing to disruption from coronavirus in key markets such as Brazil, the United States and Europe.</em></p>
<p>&nbsp;</p></blockquote>
<p>&nbsp;</p>
<h2><strong>Conclusion</strong></h2>
<p>&nbsp;</p>
<p><span style="font-weight: 400;">We saw some quick examples of <strong><em>Extractive summarization</em></strong>, one using Gensim&#8217;s TextRank algorithm, and another using Huggingface&#8217;s pre-trained transformer model. In further posts, we will go over <strong>LSTM</strong>, <strong>BERT</strong>, and <strong>Google&#8217;s T5 transformer</strong> models in-depth and look at how they work to do tasks such as <em><strong>abstractive summarization</strong></em>.</span></p>
<p>The post <a href="https://turbolab.in/types-of-text-summarization-extractive-and-abstractive-summarization-basics/">Types of Text Summarization: Extractive and Abstractive Summarization Basics</a> appeared first on <a href="https://turbolab.in">Turbolab Technologies</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://turbolab.in/types-of-text-summarization-extractive-and-abstractive-summarization-basics/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">582</post-id>	</item>
	</channel>
</rss>
