H.3. Artificial Intelligence
Damianus Kofi Owusu; Christiana Cynthia Nyarko; Joseph Acquah; Joel Yarney
Abstract
Head and neck cancer (HNC) recurrence is ever increasing among Ghanaian men and women. Because not all machine learning classifiers are equally created, even if multiple of them suite very well for a given task, it may be very difficult to find one which performs optimally given different distributions. ...
Read More
Head and neck cancer (HNC) recurrence is ever increasing among Ghanaian men and women. Because not all machine learning classifiers are equally created, even if multiple of them suite very well for a given task, it may be very difficult to find one which performs optimally given different distributions. The stacking learns how to best combine weak classifier models to form a strong model. As a prognostic model for classifying HNSCC recurrence patterns, this study tried to identify the best stacked ensemble classifier model when the same ML classifiers for feature selection and stacked ensemble learning are used. Four stacked ensemble models; in which first one used two base classifiers: gradient boosting machine (GBM) and distributed random forest (DRF); second one used three base classifiers: GBM, DRF, and deep neural network (DNN); third one used four base classifiers: GBM, DRF, DNN, and generalized linear model (GLM); and fourth one used five base classifiers: GBM, DRF, DNN, GLM, and Naïve bayes (NB) were developed, using GBM meta-classifier in each case. The results showed that implementing stacked ensemble technique consisting of five base classifiers on gradient boosted features achieved better performance than achieved on other feature subsets, and implementing this stacked ensemble technique on gradient boosted features achieved better performance compared to other stacked ensemble techniques implemented on gradient boosted features and other feature subsets used. Learning stacked ensemble technique having five base classifiers on GBM features is clinically appropriate as a prognostic model for classifying and predicting HNSCC patients’ recurrence data.
Seyed Mahdi Sadatrasoul; Omid Mahdi Ebadati; Amir Amirzadeh Irani
Abstract
Companies have different considerations for using smoothing in their financial statements, including annual general meeting, auditing, Regulatory and Supervisory institutions and shareholders requirements. Smoothing is done based on the various possible and feasible choices in identifying company’s ...
Read More
Companies have different considerations for using smoothing in their financial statements, including annual general meeting, auditing, Regulatory and Supervisory institutions and shareholders requirements. Smoothing is done based on the various possible and feasible choices in identifying company’s incomes, costs, expenses, assets and liabilities. Smoothing can affect credit scoring models reliability, it can cause to providing/not providing facilities to a non-worthy/worthy organization orderly, which are both known as decision errors and are reported as “type I” and “type II” errors, which are very important for Banks Loan portfolio. This paper investigates this issue for the first time in credit scoring studies on the authors knowledge and searches. The data of companies associated with a major Asian Bank are first applied using logistic regression. Different smoothing scenarios are tested, using wilcoxon statistic indicated that traditional credit scoring models have significant errors when smoothing procedures have more than 20% change in adjusting company’s financial statements and balance sheets parameters.
Seyed M. Sadatrasoul; O. Ebadati; R. Saedi
Abstract
The purpose of this study is to reduce the uncertainty of early stage startups success prediction and filling the gap of previous studies in the field, by identifying and evaluating the success variables and developing a novel business success failure (S/F) data mining classification prediction model ...
Read More
The purpose of this study is to reduce the uncertainty of early stage startups success prediction and filling the gap of previous studies in the field, by identifying and evaluating the success variables and developing a novel business success failure (S/F) data mining classification prediction model for Iranian start-ups. For this purpose, the paper is seeking to extend Bill Gross and Robert Lussier S/F prediction model variables and algorithms in a new context of Iranian start-ups which starts from accelerators in order to build a new S/F prediction model. A sample of 161 Iranian start-ups which are based in accelerators from 2013 to 2018 is applied and 39 variables are extracted from the literature and organized in five groups. Then the sample is fed into six well-known classification algorithms. Two staged stacking as a classification model is the best performer among all other six classification based S/F prediction models and it can predict binary dependent variable of success or failure with accuracy of 89% on average. Also finding shows that “starting from Accelerators”, “creativity and problem solving ability of founders”, “fist mover advantage” and “amount of seed investment” are the four most important variables which affects the start-ups success and the other 15 variables are less important.
H.3.8. Natural Language Processing
S. Lazemi; H. Ebrahimpour-komleh
Abstract
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, ...
Read More
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser for Persian. The defined feature space in each parser is one of the important factors in its success. Our goal is to generate and extract appropriate features to dependency parsing of Persian sentences. To achieve this goal, new semantic and syntactic features have been defined and added to the MSTParser by stacking method. Semantic features are obtained by using word clustering algorithms based on syntagmatic analysis and syntactic features are obtained by using the Persian phrase-structure parser and have been used as bit-string. Experiments have been done on the Persian Dependency Treebank (PerDT) and the Uppsala Persian Dependency Treebank (UPDT). The results indicate that the definition of new features improves the performance of the dependency parser for the Persian. The achieved unlabeled attachment score for PerDT and UPDT are 89.17% and 88.96% respectively.