Getting Started with Generative AI
This chapter introduces the key principles behind Generative AI. What it is, how it works, and why it matters for filmmakers, designers, and storytellers.It’s not about coding or model math, but about understanding how AI interprets your creative direction.
1. What Generative AI actually does
Generative AI systems are trained on vast datasets of text, images, video, and sound. They don’t “think” or “know”, they predict what fits best next, based on patterns they’ve learned. When you type a prompt, the AI doesn’t “imagine”; it statistically predicts what kind of words, pixels, or frames usually follow that kind of input.
2. From Data to Creativity
Generative AI mimics style, not intent. That’s why your direction (your prompt ) remains the most creative part of the process.
| Step | What happens |
|---|---|
| 1. Training | The model analyses millions of examples (text, photos, scripts, videos) and learns correlations. |
| 2. Prompting | You describe what you want. The model turns your words into mathematical patterns. |
| 3. Generation | The AI predicts and constructs a new output — text, image, sound, or video. |
| 4. Refinement | You review and iterate. |
3. The Family of Generative Tools
Different models handle different types of media. Here’s how they connect to audiovisual production:
| Modality | Example Tools | What They Generate | Common Use in AV |
|---|---|---|---|
| Text | ChatGPT, Gemini, Claude | Scripts, story ideas, captions | Concept development, scriptwriting |
| Image | Midjourney, Leonardo AI, DALL·E | Concept art, storyboards, posters | Visual development |
| Video | Runway, Pika, Kaiber | Short clips, transitions, visual ideas | Previz, experimental production |
| Audio | ElevenLabs, MusicFX | Voices, soundtracks, ambience | Voiceovers, sound design |
| 3D / XR | Spline AI, Luma, Gaussian Splatting | 3D assets and environments | Virtual sets, previsualisation |
Note: These tools evolve rapidly, but their creative logic (text in → media out) remains the same. That’s why understanding prompting matters more than memorising tool names.
If you are unfamiliar with any of the terms, please check our overview on the website:
https://www.creativeailab.be/aibc-understanding-the-flows-for-generative-ai-content/
4. Fine-Tuning (Custom Training)
Fine-tuning means retraining part of a model on your own data so it adapts to a specific domain, tone, or visual style.
Why fine-tune?
- To make outputs reflect your brand, aesthetic, or subject matter
- To correct unwanted biases or artefacts
- To improve quality on narrow creative tasks (e.g., interviews, product films, animation styles)
How it works:
- Choose a base model (e.g., an open multimodal model).
- Prepare a small dataset of paired examples (inputs + desired outputs).
- Train or “fit” the model on your data — usually the final layers.
- Validate, compare, and iterate.
Things to know:
- Requires GPUs and time; small datasets can overfit easily.
- Fine-tuning can be partial — e.g., only the vision encoder or decoder.
- Always maintain a validation set to check generalisation.
Fine-tuning is increasingly accessible thanks to frameworks like LoRA and open multimodal training pipelines.
Working Locally (On-Device / On-Premise)
For audiovisual companies, running AI models locally or on private servers offers several benefits:
Advantages:
Data privacy — no third-party cloud exposure.
Full control — versioning, workflow integration, reproducibility.
Real-time performance — reduced latency for on-set or live production.
Challenges:
- Requires powerful hardware (GPUs, high VRAM).
- Large models may need quantisation or compression to fit local devices.
- Maintenance and updates are your responsibility.