Z. Zhang, Y. Xie, and L. Yang, “Photo-graphic text-to-image synthesis with a hierarchically-nested adversarial network”, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6199-6208, 2018.
 I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks”, in Advances in neural information processing systems, pp. 2672-2680, 2014.
 Y. Li, Y. Chen, and Y. Shi, “Brain tumor segmentation using 3D generative adversarial networks”, International Journal of Pattern Recognition and Arti_cial Intelligence, p. 2157002, 2020.
 Y. Li, Z. He, Y. Zhang, and Z. Yang, “High-quality many-to-many voice conversion using transitive star generative adversarial networks with adaptive instance normalization” Journal of Circuits, Systems and Computers (2020).
 A. Fakhari. and K. Kiani. "An image restoration architecture using abstract features and generative models." Journal of AI and Data Mining. Vol. 9, No. 1, pp. 129-139, 2021.
 M.M. Haji-Esmaeili and G. Montazer, “Automatic coloring of grayscale images using generative adversarial networks”, Journal of Signal and Data Processing (JSDP), Vol. 16 (1), pp. 57-74, 2019.
 S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, arXiv preprint arXiv:1605.05396, 2016.
 H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X.Wang, and D. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks”, in Proc. of the IEEE int. conference on computer vision, pp. 5907-5915, 2017.
 H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan++: Realistic image synthesis with stacked generative adversarial networks”, IEEE transactions on pattern analysis and machine intelligence, 41, 1947-1962, 2018.
 K.J. Joseph, A. Pal, S. Rajanala, and V.N. Balasubramanian, “C4synth: Cross-caption cycle-consistent text-to-image synthesis”, in IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 358-366, 2019.
 T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Fine-grained text to image generation with attentional generative adversarial networks”, in Proc. of the IEEE conf. on computer vision and pattern recognition, pp. 1316-1324, 2018.
 M. Zhu, P. Pan, W. Chen, and Y. Yang, “Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802-5810, 2019.
 C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds-200-2011 dataset, 2011.
 N. Ilinykh, S. ZarrieB, and D. Schlangen, “Tell me more: A dataset of visual scene description sequences”, in Proc. of the 12th International Conference on Natural Language Generation, pp. 152-157, 2019.
 T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans”, in Advances in neural information processing systems (NIPSs), pp. 2234-2242, 2016.
 M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium”, arXiv preprint arXiv:1706.08500. 2017.
 P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, “Attention-based bidirectional long short-term memory networks for relation classification”, in Proceedings of the 54th annual meeting of the association for computational linguistics, pp. 207-212, 2016.
 C. Szegedy, V. Vanhoucke, S. Io_e, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision”, in Proc. of the IEEE conf. on computer vision and pattern recognition, pp. 2818-2826, 2016.
 C. Gulcehre, S. Chandar, K. Cho, and Y. Bengio, “Dynamic neural Turing machine with continuous and discrete addressing schemes”, Neural computation, 30, 857-884, 2018.
 A. Miller, A. Fisch, J. Dodge, A. H. Karimi, A. Bordes, and J. Weston, “Key-value memory networks for directly reading documents”, in Proc. of Empirical Methods in Natural Language Processing (EMNLP), 2016.
 X. Yan, J. Yang, K. Sohn, and H. Lee, “Attribute2image: Conditional image generation
from visual attributes”, in European Conf. on Computer Vision, pp. 776-791, 2016.
 X. Zhu, A.B. Goldberg, M. Eldawy, C.R. Dyer, and B. Strock, “A text-to-picture synthesis system for augmenting communication”, in (AAAI) , pp. 1590-1595, 2007.
 A. Dash, J.C.B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal, “Tac-gan-text conditioned auxiliary classifier generative adversarial network”, arXiv preprint arXiv:1703.06412, 2017.
 J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Text-to-image generation grounded by fine-grained user attention”. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 237-246, 2021.
 T. Baltrusaitis, C. Ahuja, and L. P. Morency, “Multi-modal machine learning: A survey and taxonomy”. IEEE transactions on pattern analysis and machine intelligence 41, 423-443, 2018.
 W. Li, P. Zhang, L. Zhang, Q. Huang, X. He, S. Lyu, and J. Gao, “Object-driven text-to-image synthesis via adversarial training”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174-12182, 2019.
 G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, “Semantics disentangling for text-to-image generation”, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2327-2336., 2019.