In an effort to enhance children's English listening and speaking skills while complementing traditional textbook materials, I embarked on a project to transform textbook content into 3D Pixar-style animated films. Remarkably, the entire process—from text and audio to animation—was generated using artificial intelligence (AI) at virtually no cost. This achievement is made possible by recent breakthroughs in AI image generation models, which have revolutionized content creation. In this article, I’ll share the motivation behind this idea, the step-by-step process of creating these animations, and the AI tools that make it accessible to everyone.
Why Turn Textbooks into Animations?
The idea to convert textbook content into animated films stemmed from the limitations of existing English learning materials. While many high-quality English textbooks, such as DK English, integrate vocabulary and dialogues into practical scenarios with accompanying images to aid visual learning, their audio components often fall short. The online audio provided by these textbooks frequently sounds robotic and unnatural, reminiscent of outdated cassette recordings from decades ago. This is a common issue across many English learning resources and contradicts the natural progression of language acquisition: listening, speaking, reading, and writing.
For effective language learning, especially in non-native English environments, children need to hear clear, natural pronunciation to mimic and develop proper speaking skills. Without this, they risk developing "deaf-mute English," where they memorize words but struggle to understand or speak them fluently. To address this, I envisioned combining high-quality audio with engaging visuals to create animated films that children would enjoy watching repeatedly. By pairing these films with textbook exercises, children could practice speaking in a fun and immersive way. This approach is not only cost-effective but also highly effective for learners in non-English-speaking regions. Beyond children, this method could inspire adults to explore creative ways to enhance their learning or teaching processes.
Step-by-Step Process for Creating AI-Generated Animations
Here’s how I used free or low-cost AI tools to transform textbook content into engaging 3D animations.
Step 1: Extracting Text from Textbooks
The first step is to extract text from the textbook. Many modern AI models, such as ChatGPT or Google’s Gemini, have robust image recognition capabilities. If a digital version of the textbook isn’t available, you can simply photograph the pages with a smartphone and upload them. I chose Google’s Gemini because it’s a comprehensive, free tool that excels at extracting text from images. Gemini successfully extracted all dialogues and vocabulary from the textbook images, providing a solid foundation for the next steps.
Step 2: Analyzing and Translating Text
Once the text is extracted, the AI can translate and explain the words and phrases to ensure a deep understanding before proceeding to animation creation. For example, Gemini clarified the difference between phrases like “It’s been ages” and “Long time no see.” This step is crucial for creators who may not be familiar with every term, enabling them to craft accurate and meaningful content for the animation.
Step 3: Crafting the Animation Script
To create the animation script, I used an “AI Employee Template” to guide the AI effectively. This approach treats the AI as an employee, with you as the boss providing clear instructions. The template consists of four parts:
- Why: Explain the purpose and motivation (e.g., enhancing children’s English learning by turning textbook content into engaging animations due to poor audio quality in existing materials).
- What: Specify the task (e.g., convert textbook text into a narrated animated film with explanations of vocabulary and dialogues).
- How: Provide guidance on the process (e.g., create vivid, child-friendly animations with realistic dialogue scenes for imitation).
- Examples: If available, provide reference cases for the AI to emulate.
For instance, I instructed the AI: “The content comes from a textbook called DK English. Due to poor audio quality, I want to convert the text into an animated film with narration to help children learn English. The film must be engaging, explain all vocabulary and dialogues, and include real-life dialogue scenes for children to imitate. Narrate it from the perspective of an English teacher.” Despite the simplicity of this prompt, the AI generated a detailed script with character settings, scene breakdowns, dialogue, animation suggestions, and narration. Providing more detailed information using this template can yield even better results, though the process often involves iterative refinement.
Step 4: Generating Natural-Sounding Audio
Next, I converted the script into natural-sounding audio. While many tools excel at generating realistic English speech, fewer handle Chinese well, especially in mixed-language scenarios. Two standout tools are ElevenLabs and MiniMax, both free to use with voice cloning capabilities. I opted for MiniMax due to its user-friendly interface and support for over 40 languages with multiple preset voices. MiniMax allows seamless switching between languages and even lets users design custom voices via text descriptions. For example, I created a virtual English teacher named Sophia with a custom voice: “Hello everyone, my name is Sophia, and I’m your new English teacher.” MiniMax’s English version also offers voice cloning, which I used to create a Donald Trump-inspired voice for fun concepts like “Trump teaching English.”
For multi-character dialogues, Google Studio’s AI tool supports generating dual-role audio by pasting dialogue in a specific format and selecting different speakers. This feature simplifies creating conversational scenes, making it ideal for dialogue-heavy animations.
Step 5: Creating the Visuals
Recent advancements in AI video generation have made it accessible for ordinary users. Key developments include:
- Subject Consistency: AI image generation models can now maintain a consistent character across multiple scenes based on a reference image, ensuring narrative continuity.
- Extended Video Length: AI can generate videos up to a minute long, reducing editing complexity.
- First-and-Last Frame Functionality: This allows seamless transitions between scenes, enabling effects like aging or costume changes.
- Cost Reduction: Video generation costs have dropped significantly, with some 5-second clips costing just a few cents.
Among the available tools, Google’s Veo3 model is highly effective but expensive. For personal use, I recommend Chinese AI tools like Keling AI, Hailuo AI, or ByteDance’s Jimeng AI, with Jimeng AI offering the best value. Jimeng AI’s Seedream 4.0 model supports features like first-and-last frame transitions, multi-frame video generation, lip-syncing, and AI sound effects at a low cost (e.g., a $10 subscription can generate thousands of images and hundreds of videos).
Using Jimeng AI, I selected the “video generation” mode and input a prompt like “Create a young, lively American English teacher in a 3D cartoon style.” For more control, I avoided the “agent mode” (which auto-generates content based on text) and instead used the image or video generation options, specifying aspect ratios (e.g., 16:9) and generating multiple images for scene variety. For example, I used a reference Lego image to generate four scenes depicting a city worker’s day, which could then be turned into a video with actions like “drinking coffee by a window.” Jimeng AI also supports AI sound effects, music, and camera motion effects, making it beginner-friendly.
Step 6: Editing the Final Video
The final step is assembling the video using free editing software like CapCut. Simply drag and drop the generated video clips, audio, and music into the timeline to create a cohesive film. The process is straightforward and requires minimal technical expertise.
The Power of AI in Content Creation
The tools used—ChatGPT and Gemini for text extraction, MiniMax and Google Studio for audio, and Jimeng AI for video—are either free or extremely affordable, making this process accessible to anyone. As AI video creator Hanqing from AItalk notes, the value of AI lies not in replicating traditional content but in creating innovative, previously impossible concepts, like “Guan Yu vs. Qin Qiong” or “Sun Wukong’s Corporate Journey.” In the AI era, creativity is the key differentiator, as the tools and techniques are now within everyone’s reach.
By combining these AI tools, anyone can transform static textbook content into engaging, Pixar-style animations that make learning fun and effective. Whether for children or adults, this approach opens up new possibilities for education and creativity.