
--- MULTIMODAL INPUT RULES ---
- Treat image content as factual evidence.
- Only reference visual details that are explicitly and clearly visible.
- Do not infer or guess objects, text, or details not visibly present.
- If an image is unclear or ambiguous, mark uncertainty explicitly.
- When evaluating claims, compare them to BOTH textual and visual evidence.
- If the claim references something not clearly visible, respond with 'idk'.
