Document Type : Original/Review Paper

Authors

1 Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Iran.

2 Big Data Research Center, Najafabad Branch, Islamic Azad University, Najafabad, Iran.

Abstract

With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved results. This research proposes a framework called ParaMaker. It generates accurate paraphrases of any sentence, similar to human behaviors and sends them to a search engine to find the plagiarism patterns. For English language, ParaMaker was examined against six known methods with standard PAN2014 datasets. Results showed an improvement of 34% in terms of Recall parameter while Precision and Speed parameters were maintained. In Persian language, statements of suspicious documents were examined compared to an exact search approach. ParaMaker showed an improvement of at least 42% while Precision and Speed were maintained.

Keywords

Main Subjects