Semantic labeling is an active field in remote sensing applications. Although handling high detailed objects in Very High Resolution (VHR) optical image and VHR Digital Surface Model (DSM) is a challenging task, it can improve the accuracy of semantic labeling methods. In this paper, a semantic labeling method is proposed by fusion of optical and normalized DSM data. Spectral and spatial features are fused into a Heterogeneous Feature Map to train the classifier. Evaluation database classes are impervious surface, building, low vegetation, tree, car, and background. The proposed method is implemented on Google Earth Engine. The method consists of several levels. First, Principal Component Analysis is applied to vegetation indexes to find maximum separable color space between vegetation and non-vegetation area. Gray Level Co-occurrence Matrix is computed to provide texture information as spatial features. Several Random Forests are trained with automatically selected train dataset. Several spatial operators follow the classification to refine the result. Leaf-Less-Tree feature is used to solve the underestimation problem in tree detection. Area, major and, minor axis of connected components are used to refine building and car detection. Evaluation shows significant improvement in tree, building, and car accuracy. Overall accuracy and Kappa coefficient are appropriate.