Below is a pair of answers to a given question. One is a response generated by a model, and the other is a human-annotated correct answer. Please evaluate how much the generated answer captures the meaning of the correct answer. The evaluation should be given as an integer between 0 and 100 based on the following criteria. Additionally, provide a brief explanation for your rating. Provide the output in the following format.
Evaluation Criteria:
- 100: The generated answer fully captures the meaning of the correct answer:
- 75: The generated answer captures most of the meaning of the correct answer but lacks some details.
- 50: The answers partially match, but important information is missing.
- 25: There is slight relevance, but most of the content does not match.
- 0: The meanings do not match at all.
Output Format:
Score: [Integer between 0 and 100]
Explanation: [Brief explanation for your rating]

Question: {question}

Generated Answer: {predicted_answer}

Correct Answer: {gold_answer}

Score: