Skip to content

Getting Started with Generative AI

This chapter introduces the key principles behind Generative AI. What it is, how it works, and why it matters for filmmakers, designers, and storytellers.It’s not about coding or model math, but about understanding how AI interprets your creative direction.

1. What Generative AI actually does

Generative AI systems are trained on vast datasets of text, images, video, and sound. They don’t “think” or “know”, they predict what fits best next, based on patterns they’ve learned. When you type a prompt, the AI doesn’t “imagine”; it statistically predicts what kind of words, pixels, or frames usually follow that kind of input.

Generative AI mimics style, not intent. That’s why your direction (your prompt ) remains the most creative part of the process.

StepWhat happens
1. TrainingThe model analyses millions of examples (text, photos, scripts, videos) and learns correlations.
2. PromptingYou describe what you want. The model turns your words into mathematical patterns.
3. GenerationThe AI predicts and constructs a new output — text, image, sound, or video.
4. RefinementYou review and iterate.

Different models handle different types of media. Here’s how they connect to audiovisual production:

ModalityExample ToolsWhat They GenerateCommon Use in AV
TextChatGPT, Gemini, ClaudeScripts, story ideas, captionsConcept development, scriptwriting
ImageMidjourney, Leonardo AI, DALL·EConcept art, storyboards, postersVisual development
VideoRunway, Pika, KaiberShort clips, transitions, visual ideasPreviz, experimental production
AudioElevenLabs, MusicFXVoices, soundtracks, ambienceVoiceovers, sound design
3D / XRSpline AI, Luma, Gaussian Splatting3D assets and environmentsVirtual sets, previsualisation

🎥 Note: These tools evolve rapidly, but their creative logic (text in → media out) remains the same. That’s why understanding prompting matters more than memorising tool names.

 👉 If you are unfamiliar with any of the terms, please check our overview on the website:

https://www.creativeailab.be/aibc-understanding-the-flows-for-generative-ai-content/

Fine-tuning means retraining part of a model on your own data so it adapts to a specific domain, tone, or visual style.

Why fine-tune?
  • To make outputs reflect your brand, aesthetic, or subject matter
  • To correct unwanted biases or artefacts
  • To improve quality on narrow creative tasks (e.g., interviews, product films, animation styles)

How it works:
  1. Choose a base model (e.g., an open multimodal model).
  2. Prepare a small dataset of paired examples (inputs + desired outputs).
  3. Train or “fit” the model on your data — usually the final layers.
  4. Validate, compare, and iterate.

  • Requires GPUs and time; small datasets can overfit easily.
  • Fine-tuning can be partial — e.g., only the vision encoder or decoder.
  • Always maintain a validation set to check generalisation.

Fine-tuning is increasingly accessible thanks to frameworks like LoRA and open multimodal training pipelines.

For audiovisual companies, running AI models locally or on private servers offers several benefits:

Advantages:

  • 🔒 Data privacy — no third-party cloud exposure.
  • ⚙️ Full control — versioning, workflow integration, reproducibility.
  • ⚡ Real-time performance — reduced latency for on-set or live production.

 

Challenges:

  • Requires powerful hardware (GPUs, high VRAM).
  • Large models may need quantisation or compression to fit local devices.
  • Maintenance and updates are your responsibility.