Model Guide

Modified on Thu, 28 Aug, 2025 at 8:15 PM

Model Types

Invoke uses two broad types of models—Interpretive and Rendering—each with their own strengths and weaknesses.

	Interpretive Models	Rendering Models
How to think about them	Low learning curve, low to medium creative control	High learning curve, high creative control
What they do	Parse natural-language prompts, follow reference images.	Focus on pixel-level synthesis with fine-grained control.
When to use	You’d rather “say” what you want than draw or composite what you want.	You need surgical control over style, composition, detail, etc.
Typical output	Good directional drafts that match your prompts or reference images.	Highly controllable work tuned via prompt tags, control layers, and custom-trained models.
Model families	ChatGPT-4o FLUX Kontext Imagen 3 Imagen 4	Any Stable Diffusion XL (SDXL) model (eg. JuggernautXL) Any Stable Diffusion 1.5 (SD 1.5) model Any FLUX.1 (dev) model

Image Models

Model	Type	Control Level	Learning Curve	Best For	Prompt Style	Key Strengths	Limitations
Gemini 2.5 Flash (Nano Banana)	Interpretive	Medium	Low	Reference image manipulation, and strong prompt adherence	Instruction + Long-form	Intuitive prompting, follows complex instructions	No image size control, less direct control tools.
ChatGPT-4o (API)	Interpretive	Low-Medium	Low	Directional drafts, natural language instructions, text	Instruction + Long-form	Intuitive prompting, follows complex instructions	Limited pixel-level control
FLUX Kontext (API)	Interpretive	Low-Medium	Low	Quick iterations, prompt-based transformation	Instruction	Fast, good prompt adherence	Less creative control
Imagen 3 (API)	Interpretive	Medium	Low-Medium	High-quality photorealistic outputs	Long-form	Good image quality, natural language understanding	Limited editing capabilities
Imagen 4 (API)	Interpretive	Medium	Low-Medium	Latest generation photorealism	Long-form	Cutting-edge quality, improved prompt handling	Limited editing capabilities
JuggernautXL (SDXL)	Rendering	High	High	Detailed creative work, style control	Prompt tags	Fine-grained control, extensive customization, established ecosystem	Full control involves a learning curve
SD 1.5 Models	Rendering	High	High	Extremely efficient styling and rendering work, customization, specialized tasks	Prompt tags	Extensive customization, established ecosystem	Requires technical knowledge
FLUX.1 (dev)	Rendering	High	High	Professional quality, prompt adherence, high quality customization	Long-form	Developer-focused, high precision	Technical complexity

Video Models

Model	Type	Control Level	Learning Curve	Best For	Prompt Style	Key Strengths	Limitations
Veo 3	Interpretive	Medium	Low	Video & audio generation	Instruction + Long-form	Intuitive prompting, follows complex instructions	Without a strong prompt, can lack movement/action

How to write prompts for different models

Prompt style	What it looks like	Best for
Instruction	“Generate a neon-lit logo” “Replace the cobblestone street with flat stones” “Add a misty fog to the scene”	Gemini 2.5, ChatGPT-4o, FLUX Kontext
Long-form	“A cinematic wide-angle shot of a misty rainforest at dawn with soft volumetric light. The rainforest is filled with vibrant diverse flora and fauna”	Gemini 2.5, Imagen, FLUX Dev, ChatGPT-4o, FLUX Kontext
Prompt tags	“ultra-wide, 32 mm, concept art, vaporwave palette, award-winning”	SD 1.5, SDXL (eg. JuggernautXL)

Pro tip: Negative prompts (“deformed, blurry, watermark”) work best on Rendering models.

Recommendations for getting started

Brand-new to AI? Start with Interpretive Models like Imagen or ChatGPT-4o before learning Rendering Models like FLUX or SDXL.
Looking to change something small about an image with text guidance? Choose FLUX Kontext, upload an image as a Global Reference Image, and add a short instruction to your prompt field.
Looking to master what the pros use? Watch our YouTube series and explore control layers and inpainting techniques with FLUX Dev and SDXL models.

Troubleshooting and tips

Symptom	Likely Cause	Quick Fix
I put in a text prompt but the image comes out kind of janky (weird faces, weird hands, etc).	Rendering models are weaker at generating “error-free” images with just text prompts.	If you are just using text prompts, try using an interpretive model.
I’m giving directions to change one part, but it’s changing the whole image.	Certain models do not have the capability of using instructive prompts + targeted guidance.	Either: Use FLUX Kontext Advanced: Use an “Inpaint Mask” layer with FLUX (dev) or SDXL.
I can’t get it to generate in the exact style that I want.	The models don’t perfectly understand your style.	Try using a reference image with an interpretive model like FLUX Kontext. If that isn’t sufficient, you can explore training your own LoRA model in our Model Training app.

More resources