Coverage for intelligence_toolkit/query_text_data/prompts.py: 100%
10 statements
« prev ^ index » next coverage.py v7.10.7, created at 2025-10-16 13:41 -0300
« prev ^ index » next coverage.py v7.10.7, created at 2025-10-16 13:41 -0300
1# Copyright (c) 2024 Microsoft Corporation. All rights reserved.
2# Licensed under the MIT license. See LICENSE file in the project.
4from intelligence_toolkit.AI.metaprompts import (
5 do_not_disrespect_context,
6 do_not_harm_question_answering,
7)
9chunk_relevance_prompt = """\
10You are a helpful assistant determining whether a given chunk of text has the potential to be relevant to a user query.
12Text chunk:
14{chunk}
16User query:
18{query}
20--Task--
21Answer "Yes" if the text chunk contains information that is potentially relevant to the user query, either directly or indirectly, and "No" if it does not.
23The text chunk does not need to answer the query, but should contain information that could be used to construct an answer. For example, the following types of information should all be considered relevant:
25- specific facts, claims, or statistics
26- background information or thematic context
27- applicable rules, guidelines, or policies
29Answer with a single word, "Yes" or "No", with no additional output.
30"""
32user_prompt = """\
33The final report should:
34- be formatted in markdown, with headings indicated by # symbols and subheadings indicated by additional # symbols
35- use plain English accessible to non-native speakers and non-technical audiences
36- support each sentence with a source reference in the same format as the input content, "[source: <file> (<chunk>), ...]"
37- include ALL source references from the extended answer, combining them as needed if they support the same claim
38"""
39report_prompt = """\
40You are a helpful assistant tasked with generating an output from an extended report.
42=== TASK ===
44User query:
46{query}
48Extended answer:
50{answer}
52"""
54query_anchoring_prompt = """\
55You are a helpful assistant tasked with rewriting a user query in a way that better matches the concepts contained in the dataset.
57The output query should retain all the key phrases from the input query, but may expand on them with additional concepts and phrasing to better match relevant concepts in the dataset. Include only the most relevant concepts, especially as specific examples or general categories of concepts present in the user query. If no concepts are relevant, the output query should be the same as the input query. Keep the output query at 1-2 sentences.
59The output query should not be more specific than the input query. For example, if the input query is "What are the effects of climate change on the environment?" and "Arctic" is a provided concept, the output query should not be "What are the effects of climate change on the environment in the Arctic?", but could be "What are the effects of climate change on the environment, for example in the Arctic?".
61User query:
63{query}
65Data concepts:
67{concepts}
69Output query:
70"""
72theme_summarization_prompt = """\
73You are a helpful assistant tasked with creating a JSON object that summarizes a theme relevant to a given user query.
75When presenting source evidence, support each sentence with a source reference to the file and text chunk: "[source: <source_id_1>, <source_id_2>, ...]". Include source IDs only - DO NOT include the chunk ID within the source ID - DO NOT repeat the same source ID within a single sentence.
77The output object should summarize the theme as follows:
79- "theme_title": a title for the theme, in the form of a claim statement supported by the points to follow. Ensure the title is distinct and specific to avoid overlap with other themes
80- "point_title": a title for a specific point within the theme, in the form of a claim statement
81- "point_evidence": a paragraph, starting with "**Source evidence**:", describing evidence from sources that support or contradict the point, without additional interpretation
82- "point_commentary": a paragraph, starting with "**AI commentary**:", suggesting inferences, implications, or conclusions that could be drawn from the source evidence
84Pay attention to previous themes, so don't repeat the same themes or points. If the theme hint is similar to a previous theme, return an empty json object ONLY.
85IMPORTANT: Make theme titles specific and focused to avoid creating duplicate or overlapping themes. If the theme hint suggests a broad category, make your theme title more specific to the actual content found in the sources.
87--Query--
89{query}
91--Theme hint--
93{theme}
95--Previous themes--
97{previous_themes}
99--Source text chunks--
101Input text chunks JSON, in the form "<source_id>: <text_chunk>":
103{chunks}
105Output JSON object:
106"""
108theme_integration_prompt = """\
109You are a helpful assistant tasked with creating a JSON object that organizes content relevant to a given user query.
111The output object should integrate the theme summaries provided as input as follows:
113- "answer": a standalone and detailed answer to the user query, derived from the points and formatted according to the user query/prompt. Quote directly from source text where appropriate, and provide a source reference for each quote
114- "report_title": a title for the final report that reflects the overall theme of the content and the user query it answers, in the form of a claim statement. Should not contain punctuation or special characters beyond spaces and hyphens
115- "report_overview": an introductory paragraph that provides an overview of the report themes in a neutral way without offering interpretations or implications
116- "report_implications": a concluding paragraph that summarizes the implications of the themes and their specific points
118IMPORTANT: Before generating the final output, consolidate any duplicate or overlapping themes. Merge themes that cover similar concepts or have overlapping content. Ensure each theme represents a distinct aspect of the analysis.
120When presenting evidence, support each sentence with one or more source references: "[source: <source_id_1>, <source_id_2>,...]". Include source IDs only - DO NOT include the chunk ID within the source ID - DO NOT repeat the same source ID within a single sentence.
123--Theme summaries--
125{content}
127--User query--
129{query}
131Output JSON object:
132"""
134commentary_prompt = """\
135You are a helpful assistant tasked with providing commentary on a set of themes derived from source texts.
137Provide commentary both on the overall thematic structure and specific examples drawn from the sample source texts.
139When presenting evidence, support each sentence with one or more source references: "[source: <source_id_1>, <source_id_2>, ...]". Include source IDs only - DO NOT include the chunk ID within the source ID - DO NOT repeat the same source ID within a single sentence.
141--User query--
143{query}
145--Themes--
147{structure}
149--Sample source texts--
151{chunks}
153--Output commentary--
155"""
157thematic_update_prompt = """\
158You are a helpful assistant tasked with creating a JSON object that updates a thematic organization of points relevant to a user query.
160The output object should capture new themes, points, and source references that should be added to or modify the existing thematic structure:
162- "updates": an array of objects, each representing an update to a point derived from the input text chunks
163- "point_id": the ID of the point to update, else the next available point ID if creating a new point
164- "point_title": the title of the point to update or create, expressed as a full and detailed sentence. If the existing point title is unchanged, the field should be left blank
165- "source_ids": an array of source IDs that support the point, to be added to the existing source IDs for the point
166- "theme_title": the title of a theme that organizes a set of related points.
168--Rules--
170- Each point MUST contain sufficient concrete details to capture the specific source information only, and not related information
171- If a source relates to an existing point, the source ID MUST be assigned to the existing point ID, rather than creating a new point
172- If the addition of a source to a point warrants a change in point title, the point title MUST be updated
173- Aim for 2-7 themes overall, with an even distribution of points across themes
174- Points should be assigned to a single theme in a logical sequence that addresses the user query
175- Themes should contain at least two points if possible
176- Order themes in a logical sequence that addresses the user query
177- Output themes need not be the same as input themes and should be regenerated as needed to maintain 2-7 themes overall
178- AVOID creating duplicate or overlapping themes - consolidate similar themes under a single, more comprehensive theme title
179- Before creating a new theme, check if the content could be merged with an existing theme
180- Theme titles should be distinct and non-overlapping - avoid themes that cover the same conceptual territory
182--User query--
184{query}
186--Existing thematic structure--
188{structure}
190--New sources by source ID--
192{sources}
194--Output JSON object--
196"""
198list_prompts = {
199 "report_prompt": report_prompt,
200 "user_prompt": user_prompt,
201 "safety_prompt": " ".join([
202 do_not_harm_question_answering,
203 do_not_disrespect_context,
204 ]),
205}