NOTE: Please refer this pdf to note down which points needed to be change marked as highlighted VIVA-Tech IJRI V1, E8 Article - 1 ">

A Survey on Multilingual Text to Image AI Generator



EOI: 10.11242/viva-tech.01.08.016

Download Full Text here



Citation

Ankit Chaudhary, Vinayak Mishra, Durvesh Kasar, Bhavika Thakur, " A Survey on Multilingual Text to Image AI Generator ", VIVA-IJRI Volume 1, Issue 8, Article 1, pp. 1-7, 2025. Published by Computer Engineering Department, VIVA Institute of Technology, Virar, India.

Abstract

Multilingual text-to-image generation is a rapidly growing field in artificial intelligence, where models create images from text descriptions in multiple languages. This survey brings together key ideas and advancements in the field. It explores techniques like fine-tuning models to personalize image generation, using translation methods to handle different languages, and improving image quality with advanced technologies like stable diffusion and SDXL models. Applications of these models include education, webtoon creation, architecture, and artistic design, showing how they can be used in various industries. While the technology has made great progress, challenges remain, such as handling language diversity, reducing biases in training data, and managing the high computational demands of these models. This paper provides an overview of the current state of multilingual text-to-image generation, highlighting how it can create accurate and detailed images from text while discussing limitations and areas for improvement. By addressing these challenges, this survey aims to support the development of more inclusive and effective AI systems for creating visual content from text worldwide.

Keywords

- Artificial Intelligence, Diffusion Models, Generative Models, Image Synthesis, Text-to-Image Generation.

References

  1. Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein and Kfir Aberman, “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation”, Computer Vision and Pattern Recognition, 2023.
  2. Muhammad Ajmal, Farooq Ahmad, AM Martinez-Enriquez and Mudasser Naseer, "Image to Multilingual Text Conversion for Literacy Education", 17th IEEE International Conference on Machine Learning and Applications, 2018.
  3. Mohammed Al-Yaari , Hasan Alkahtani, Vadim Ratner and Yehoshua Y. Zeevi, “Stable denoising-enhancement of images by telegraph-diffusion operators”, IEEE, 2013.
  4. Lorenzo Papa, Lorenzo Faiella, Luca Corvitto, Luca Maiano and Irene Amerini, “On the use of Stable Diffusion for creating realistic faces: from generation to detection”, 11th International Workshop on Biometrics and Forensics, 2023.
  5. Yaoyiran Li, Ching-Yun Chang, Stephen Rawls, Ivan Vulić and Anna Korhonen, “Translation-Enhanced Multilingual Text-to-Image Generation”, Computation and Language, 2023.
  6. Pedro Reviriego and Elena Merino Gómez, “Text to Image Generation: Leaving no Language Behind”, Computation and Language, 2022.
  7. Kyungho Yu, Hyoungju Kim, Jeongin Kim,nChanjun Chun and Pankoo Kim, “A Study on Generating Webtoons Using Multilingual Text-to-Image Models”, Applied Sciences, 2023.
  8. Mr.R. Nanda Kumar, Manoj Kumar M, Hari Hara Sudhan V and Santhosh R, "TEXT TO IMAGE GENERATION USING AI", International Journal of Creative Research Thoughts, vol 11, issue 5, May 2023.
  9. Aditi Singh, "A Survey of AI Text-to-Image and AI Text-to-Video Generators", Computer Vision and Pattern Recognition, 2023.
  10. Akanksha Singh, Sonam Anekar, Ritika Shenoy and Prof. Sainath Patil, "Text to Image using Deep Learning", International Journal of Engineering Research & Technology, vol 10, issue 4, April 2021.
  11. Enjellina, Eleonora Vilgia Putri Beyan and Anastasya Gisela Cinintya Rossy, “Review of AI Image Generator: Influences, Challenges, and Future Prospects for Architectural Field”, Journal of Artificial Intelligence in Architecture, 2023.
  12. Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, Dacheng Tao and Shuaiwen Leon Song, "RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model", Computer Vision and Pattern Recognition, 2023.
  13. Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna and Robin Rombach, "SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis", Computer Vision and Pattern Recognition, 2023.
  14. Badhri Narayanan Suresh(chair), Ahmad Kiswani, Ashwin Nanjappa, Itay Hubara, Michal Szutenberg, Rachitha Prem Seelin, Vijaya Singh and Yiheng Zhang, “SDXL: An MLPerf Inference benchmark for text-to-image generation”, MLCommons, 2024.
  15. Nimesh Bali Yadav, Aryan Sinha, Mohit Jain and Aman Agrawal, "Generation of Images from Text Using AI", International Journal of Engineering and Manufacturing, 2024.
  16. M. Ozaki, Y. Adachi, Y. Iwahori, and N. Ishii, “Application of fuzzy theory to writer recognition of Chinese characters,” International Journal of Modelling and Simulation, 18(2), 1998, pp. 112-116.
  17. R.E. Moore, Interval analysis (Englewood Cliffs, NJ: Prentice-Hall, 1966).
  18. P.O. Bishop, Neurophysiology of binocular vision, in Houseman (Ed.), Handbook of physiology, 4 (New York: Springer-Verilog, 1970) pp. 342-366.
  19. D.S. Chan, Theory and implementation of multidimensional discrete systems for signal processing, doctoral diss., Massachusetts Institute of Technology, Cambridge, MA, 1978.
  20. W.J. Book, “Modelling design and control of flexible manipulator arms: A tutorial review,” 29th IEEE Conf. on Decision and Control, San Francisco, CA, 1990, pp. 500-506.