<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Shahrood University of Technology</PublisherName>
				<JournalTitle>Journal of AI and Data Mining</JournalTitle>
				<Issn>2322-5211</Issn>
				<Volume>10</Volume>
				<Issue>4</Issue>
				<PubDate PubStatus="epublish">
					<Year>2022</Year>
					<Month>11</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Speech Emotion Recognition using Enriched Spectrogram and Deep Convolutional Neural Network Transfer Learning</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>539</FirstPage>
			<LastPage>547</LastPage>
			<ELocationID EIdType="pii">2576</ELocationID>
			
<ELocationID EIdType="doi">10.22044/jadm.2022.12241.2372</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>B. Z.</FirstName>
					<LastName>Mansouri</LastName>
<Affiliation>Electrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.</Affiliation>

</Author>
<Author>
					<FirstName>H.R.</FirstName>
					<LastName>Ghaffary</LastName>
<Affiliation>Electrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.</Affiliation>

</Author>
<Author>
					<FirstName>A.</FirstName>
					<LastName>Harimi</LastName>

						<AffiliationInfo>
						<Affiliation>Electrical and Computer Engineering Department, Ferdows branch, Islamic Azad University, Ferdows, Iran.</Affiliation>
						</AffiliationInfo>

						<AffiliationInfo>
						<Affiliation>Electrical and Computer Engineering Department, Shahrood branch, Islamic Azad University, Shahrood, Iran.</Affiliation>
						</AffiliationInfo>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2022</Year>
					<Month>08</Month>
					<Day>28</Day>
				</PubDate>
			</History>
		<Abstract>Speech emotion recognition (SER) is a challenging field of research that has attracted attention during the last two decades. Feature extraction has been reported as the most challenging issue in SER systems. Deep neural networks could partially solve this problem in some other applications. In order to address this problem, we proposed a novel enriched spectrogram calculated based on the fusion of wide-band and narrow-band spectrograms. The proposed spectrogram benefited from both high temporal and spectral resolution. Then we applied the resultant spectrogram images to the pre-trained deep convolutional neural network, ResNet152. Instead of the last layer of ResNet152, we added five additional layers to adopt the model to the present task. All the experiments performed on the popular EmoDB dataset are based on leaving one speaker out of a technique that guarantees the speaker&#039;s independency from the model. The model gains an accuracy rate of 88.97% which shows the efficiency of the proposed approach in contrast to other state-of-the-art methods.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">Wideband and narrowband spectrogram</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">ResNet152</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">DCNN</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Transfer learning</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Speech emotion recognition</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://jad.shahroodut.ac.ir/article_2576_9ab41e081a2486b986bdbc5e658ee3f4.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
