Currently, very few studies have discussed representations of a national image in textbooks of English as a foreign language (EFL) from the multimodal critical cognitive perspective. To address this issue, this study, based on conceptual blending theory, critical discourse analysis, and multimodality, aims to qualitatively examine the co-instantiation of texts, images, and tasks that represent the national image in two Chinese EFL textbook series, People’s Education Press (PEP) and Foreign Language Teaching and Research Press (FLTRP). The study explores the text-image-task semiotic relationship in constructing social-cultural meanings in textbooks. Content analysis of the selected textbooks suggests that the indexical relationship between the text, image, and task matters in the representation of the national image. The findings reveal how the text-image-task co-instantiation helps EFL learners develop cultural awareness of national image cognitively. The study makes comparisons between PEP and FLTRP and suggests that teachers’ pedagogical strategies, textbook design, and learners’ learning approach be improved in the development of cultural awareness.