[1] Z. Zhang, Y. Xie, and L. Yang, “Photo-graphic text-to-image synthesis with a hierarchically-nested adversarial network”, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6199-6208, 2018.
[2] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks”, in Advances in neural information processing systems, pp. 2672-2680, 2014.
[3] Y. Li, Y. Chen, and Y. Shi, “Brain tumor segmentation using 3D generative adversarial networks”, International Journal of Pattern Recognition and Arti_cial Intelligence, p. 2157002, 2020.
[4] Y. Li, Z. He, Y. Zhang, and Z. Yang, “High-quality many-to-many voice conversion using transitive star generative adversarial networks with adaptive instance normalization” Journal of Circuits, Systems and Computers (2020).
[5] A. Fakhari. and K. Kiani. "An image restoration architecture using abstract features and generative models." Journal of AI and Data Mining. Vol. 9, No. 1, pp. 129-139, 2021.
[6] M.M. Haji-Esmaeili and G. Montazer, “Automatic coloring of grayscale images using generative adversarial networks”, Journal of Signal and Data Processing (JSDP), Vol. 16 (1), pp. 57-74, 2019.
[7] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis”, arXiv preprint arXiv:1605.05396, 2016.
[8] H. Zhang, T. Xu, H. Li, S. Zhang, X. Huang, X.Wang, and D. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks”, in Proc. of the IEEE int. conference on computer vision, pp. 5907-5915, 2017.
[9] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan++: Realistic image synthesis with stacked generative adversarial networks”, IEEE transactions on pattern analysis and machine intelligence, 41, 1947-1962, 2018.
[10] K.J. Joseph, A. Pal, S. Rajanala, and V.N. Balasubramanian, “C4synth: Cross-caption cycle-consistent text-to-image synthesis”, in IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 358-366, 2019.
[11] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Fine-grained text to image generation with attentional generative adversarial networks”, in Proc. of the IEEE conf. on computer vision and pattern recognition, pp. 1316-1324, 2018.
[12] M. Zhu, P. Pan, W. Chen, and Y. Yang, “Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802-5810, 2019.
[13] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, The caltech-ucsd birds-200-2011 dataset, 2011.
[14] N. Ilinykh, S. ZarrieB, and D. Schlangen, “Tell me more: A dataset of visual scene description sequences”, in Proc. of the 12th International Conference on Natural Language Generation, pp. 152-157, 2019.
[15] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans”, in Advances in neural information processing systems (NIPSs), pp. 2234-2242, 2016.
[16] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium”, arXiv preprint arXiv:1706.08500. 2017.
[17] P. Zhou, W. Shi, J. Tian, Z. Qi, B. Li, H. Hao, and B. Xu, “Attention-based bidirectional long short-term memory networks for relation classification”, in Proceedings of the 54th annual meeting of the association for computational linguistics, pp. 207-212, 2016.
[18] C. Szegedy, V. Vanhoucke, S. Io_e, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision”, in Proc. of the IEEE conf. on computer vision and pattern recognition, pp. 2818-2826, 2016.
[19] C. Gulcehre, S. Chandar, K. Cho, and Y. Bengio, “Dynamic neural Turing machine with continuous and discrete addressing schemes”, Neural computation, 30, 857-884, 2018.
[20] A. Miller, A. Fisch, J. Dodge, A. H. Karimi, A. Bordes, and J. Weston, “Key-value memory networks for directly reading documents”, in Proc. of Empirical Methods in Natural Language Processing (EMNLP), 2016.
[21] X. Yan, J. Yang, K. Sohn, and H. Lee, “Attribute2image: Conditional image generation
from visual attributes”, in European Conf. on Computer Vision, pp. 776-791, 2016.
[22] X. Zhu, A.B. Goldberg, M. Eldawy, C.R. Dyer, and B. Strock, “A text-to-picture synthesis system for augmenting communication”, in (AAAI) , pp. 1590-1595, 2007.
[23] A. Dash, J.C.B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal, “Tac-gan-text conditioned auxiliary classifier generative adversarial network”, arXiv preprint arXiv:1703.06412, 2017.
[24] J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Text-to-image generation grounded by fine-grained user attention”. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 237-246, 2021.
[25] T. Baltrusaitis, C. Ahuja, and L. P. Morency, “Multi-modal machine learning: A survey and taxonomy”. IEEE transactions on pattern analysis and machine intelligence 41, 423-443, 2018.
[26] W. Li, P. Zhang, L. Zhang, Q. Huang, X. He, S. Lyu, and J. Gao, “Object-driven text-to-image synthesis via adversarial training”, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174-12182, 2019.
[27] G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, “Semantics disentangling for text-to-image generation”, in Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2327-2336., 2019.