Coverage for intelligence_toolkit/query_text_data/prompts.py: 100%

10 statements  

« prev     ^ index     » next       coverage.py v7.10.7, created at 2025-10-16 13:41 -0300

1# Copyright (c) 2024 Microsoft Corporation. All rights reserved. 

2# Licensed under the MIT license. See LICENSE file in the project. 

3 

4from intelligence_toolkit.AI.metaprompts import ( 

5 do_not_disrespect_context, 

6 do_not_harm_question_answering, 

7) 

8 

9chunk_relevance_prompt = """\ 

10You are a helpful assistant determining whether a given chunk of text has the potential to be relevant to a user query. 

11 

12Text chunk: 

13 

14{chunk} 

15 

16User query: 

17 

18{query} 

19 

20--Task-- 

21Answer "Yes" if the text chunk contains information that is potentially relevant to the user query, either directly or indirectly, and "No" if it does not. 

22 

23The text chunk does not need to answer the query, but should contain information that could be used to construct an answer. For example, the following types of information should all be considered relevant: 

24 

25- specific facts, claims, or statistics 

26- background information or thematic context 

27- applicable rules, guidelines, or policies 

28 

29Answer with a single word, "Yes" or "No", with no additional output. 

30""" 

31 

32user_prompt = """\ 

33The final report should: 

34- be formatted in markdown, with headings indicated by # symbols and subheadings indicated by additional # symbols 

35- use plain English accessible to non-native speakers and non-technical audiences 

36- support each sentence with a source reference in the same format as the input content, "[source: <file> (<chunk>), ...]" 

37- include ALL source references from the extended answer, combining them as needed if they support the same claim 

38""" 

39report_prompt = """\ 

40You are a helpful assistant tasked with generating an output from an extended report. 

41 

42=== TASK === 

43 

44User query: 

45 

46{query} 

47 

48Extended answer: 

49 

50{answer} 

51 

52""" 

53 

54query_anchoring_prompt = """\ 

55You are a helpful assistant tasked with rewriting a user query in a way that better matches the concepts contained in the dataset. 

56 

57The output query should retain all the key phrases from the input query, but may expand on them with additional concepts and phrasing to better match relevant concepts in the dataset. Include only the most relevant concepts, especially as specific examples or general categories of concepts present in the user query. If no concepts are relevant, the output query should be the same as the input query. Keep the output query at 1-2 sentences. 

58 

59The output query should not be more specific than the input query. For example, if the input query is "What are the effects of climate change on the environment?" and "Arctic" is a provided concept, the output query should not be "What are the effects of climate change on the environment in the Arctic?", but could be "What are the effects of climate change on the environment, for example in the Arctic?". 

60 

61User query: 

62 

63{query} 

64 

65Data concepts: 

66 

67{concepts} 

68 

69Output query: 

70""" 

71 

72theme_summarization_prompt = """\ 

73You are a helpful assistant tasked with creating a JSON object that summarizes a theme relevant to a given user query. 

74 

75When presenting source evidence, support each sentence with a source reference to the file and text chunk: "[source: <source_id_1>, <source_id_2>, ...]". Include source IDs only - DO NOT include the chunk ID within the source ID - DO NOT repeat the same source ID within a single sentence. 

76 

77The output object should summarize the theme as follows: 

78 

79- "theme_title": a title for the theme, in the form of a claim statement supported by the points to follow. Ensure the title is distinct and specific to avoid overlap with other themes 

80- "point_title": a title for a specific point within the theme, in the form of a claim statement 

81- "point_evidence": a paragraph, starting with "**Source evidence**:", describing evidence from sources that support or contradict the point, without additional interpretation 

82- "point_commentary": a paragraph, starting with "**AI commentary**:", suggesting inferences, implications, or conclusions that could be drawn from the source evidence 

83 

84Pay attention to previous themes, so don't repeat the same themes or points. If the theme hint is similar to a previous theme, return an empty json object ONLY. 

85IMPORTANT: Make theme titles specific and focused to avoid creating duplicate or overlapping themes. If the theme hint suggests a broad category, make your theme title more specific to the actual content found in the sources. 

86 

87--Query-- 

88 

89{query} 

90 

91--Theme hint-- 

92 

93{theme} 

94 

95--Previous themes-- 

96 

97{previous_themes} 

98 

99--Source text chunks-- 

100 

101Input text chunks JSON, in the form "<source_id>: <text_chunk>": 

102 

103{chunks} 

104 

105Output JSON object: 

106""" 

107 

108theme_integration_prompt = """\ 

109You are a helpful assistant tasked with creating a JSON object that organizes content relevant to a given user query. 

110 

111The output object should integrate the theme summaries provided as input as follows: 

112 

113- "answer": a standalone and detailed answer to the user query, derived from the points and formatted according to the user query/prompt. Quote directly from source text where appropriate, and provide a source reference for each quote 

114- "report_title": a title for the final report that reflects the overall theme of the content and the user query it answers, in the form of a claim statement. Should not contain punctuation or special characters beyond spaces and hyphens 

115- "report_overview": an introductory paragraph that provides an overview of the report themes in a neutral way without offering interpretations or implications 

116- "report_implications": a concluding paragraph that summarizes the implications of the themes and their specific points 

117 

118IMPORTANT: Before generating the final output, consolidate any duplicate or overlapping themes. Merge themes that cover similar concepts or have overlapping content. Ensure each theme represents a distinct aspect of the analysis. 

119 

120When presenting evidence, support each sentence with one or more source references: "[source: <source_id_1>, <source_id_2>,...]". Include source IDs only - DO NOT include the chunk ID within the source ID - DO NOT repeat the same source ID within a single sentence. 

121 

122 

123--Theme summaries-- 

124 

125{content} 

126 

127--User query-- 

128 

129{query} 

130 

131Output JSON object: 

132""" 

133 

134commentary_prompt = """\ 

135You are a helpful assistant tasked with providing commentary on a set of themes derived from source texts. 

136 

137Provide commentary both on the overall thematic structure and specific examples drawn from the sample source texts. 

138 

139When presenting evidence, support each sentence with one or more source references: "[source: <source_id_1>, <source_id_2>, ...]". Include source IDs only - DO NOT include the chunk ID within the source ID - DO NOT repeat the same source ID within a single sentence. 

140 

141--User query-- 

142 

143{query} 

144 

145--Themes-- 

146 

147{structure} 

148 

149--Sample source texts-- 

150 

151{chunks} 

152 

153--Output commentary-- 

154 

155""" 

156 

157thematic_update_prompt = """\ 

158You are a helpful assistant tasked with creating a JSON object that updates a thematic organization of points relevant to a user query. 

159 

160The output object should capture new themes, points, and source references that should be added to or modify the existing thematic structure: 

161 

162- "updates": an array of objects, each representing an update to a point derived from the input text chunks 

163- "point_id": the ID of the point to update, else the next available point ID if creating a new point 

164- "point_title": the title of the point to update or create, expressed as a full and detailed sentence. If the existing point title is unchanged, the field should be left blank 

165- "source_ids": an array of source IDs that support the point, to be added to the existing source IDs for the point 

166- "theme_title": the title of a theme that organizes a set of related points. 

167 

168--Rules-- 

169 

170- Each point MUST contain sufficient concrete details to capture the specific source information only, and not related information 

171- If a source relates to an existing point, the source ID MUST be assigned to the existing point ID, rather than creating a new point 

172- If the addition of a source to a point warrants a change in point title, the point title MUST be updated 

173- Aim for 2-7 themes overall, with an even distribution of points across themes 

174- Points should be assigned to a single theme in a logical sequence that addresses the user query 

175- Themes should contain at least two points if possible 

176- Order themes in a logical sequence that addresses the user query 

177- Output themes need not be the same as input themes and should be regenerated as needed to maintain 2-7 themes overall 

178- AVOID creating duplicate or overlapping themes - consolidate similar themes under a single, more comprehensive theme title 

179- Before creating a new theme, check if the content could be merged with an existing theme 

180- Theme titles should be distinct and non-overlapping - avoid themes that cover the same conceptual territory 

181 

182--User query-- 

183 

184{query} 

185 

186--Existing thematic structure-- 

187 

188{structure} 

189 

190--New sources by source ID-- 

191 

192{sources} 

193 

194--Output JSON object-- 

195 

196""" 

197 

198list_prompts = { 

199 "report_prompt": report_prompt, 

200 "user_prompt": user_prompt, 

201 "safety_prompt": " ".join([ 

202 do_not_harm_question_answering, 

203 do_not_disrespect_context, 

204 ]), 

205}