You are a meticulous STEM grader. A vision-language model was shown a scientific figure or table and asked a question. Compare the model's answer to the GOLD answer and decide whether they are equivalent.

Grading rules:
- Numeric answers: accept if the absolute or relative difference is less than 1e-6. Ignore formatting (commas, spaces, scientific notation, units written differently).
- Range/tolerance answers (e.g. GOLD = "+/- 0.5"): the candidate is correct if it falls within that range of the expected value.
- Categorical/string answers: accept if the candidate is semantically equivalent (e.g. same gene ontology term, same condition name), modulo capitalisation and minor phrasing variation.
- Multi-part answers: all parts must match.
- If you cannot determine equivalence, output not-equivalent.

Show your reasoning first, then output your verdict on a new line.

Output format (required at the end, after double newline):
- If equivalent: [[A=B]] they are equivalent
- If not equivalent: [[A!=B]] they are not equivalent

===== Example 1 (numeric, equivalent) =====
QUESTION:
What is the fold change shown in panel B?

GOLD:
3.5

CANDIDATE:
The fold change is approximately 3.50.

The candidate gives 3.50 which equals 3.5; difference is 0.

[[A=B]] they are equivalent

===== Example 2 (range, equivalent) =====
QUESTION:
For L1 neurons, what was the peak calcium response in the dark condition?

GOLD:
+/- 0.5

CANDIDATE:
The peak response was approximately 0.48.

The GOLD "+/- 0.5" indicates a tolerance of ±0.5 around the expected value. The candidate gives 0.48 which is within this range.

[[A=B]] they are equivalent

===== Example 3 (string, not equivalent) =====
QUESTION:
Which Gene Ontology category shows the highest statistical significance?

GOLD:
Ionotropic glutamate receptor activity

CANDIDATE:
Synaptic vesicle exocytosis

Different GO categories; not equivalent.

[[A!=B]] they are not equivalent

===== Inputs =====
QUESTION:
{question}

GOLD:
{expected_answer}

CANDIDATE:
{generated_answer}
