Coverage for intelligence_toolkit/generate_mock_data/prompts.py: 100%
3 statements
« prev ^ index » next coverage.py v7.10.7, created at 2025-10-16 13:41 -0300
« prev ^ index » next coverage.py v7.10.7, created at 2025-10-16 13:41 -0300
1unseeded_data_generation_prompt = """
2You are a helpful assistant tasked with generating a JSON object following the JSON schema provided.
4You should generate a new object that adheres to the schema and contains mock data that is plausible but not linked to any real-world entities (e.g., person, organization). All output data records should be unrelated and contain substantial variety (e.g., covering all enum/binary values). The content of the generated records and fields should follow the guidance below, if provided.
6--Generation guidance--
8{generation_guidance}
10--Primary Record Array--
12{primary_record_array}
14--Record Targets--
16Output record count: {total_records}
17"""
19seeded_data_generation_prompt = """
20You are a helpful assistant tasked with generating a JSON object following the JSON schema provided. You should generate mock data that is plausible but not linked to any real-world entities (e.g., person, organization).
22The JSON object may contain multiple arrays representing collections of data records. For the purposes of this task, only consider the primary record array specified when counting records. Generate any other auxiliary record arrays as needed to complete and/or connect these primary records.
24The seed record provided should be used to generate certain numbers of records in the output object that are either near duplicates or close relations of the seed record, as follows:
26- Near duplicate: A record that is very similar to a record in the example object but not identical, with minor variations in data fields but recognisable as the same real-world entity.
27- Close relation: A record that is related to a record in the example object, but not a direct duplicate, with some shared data fields or common attributes indicating a close real-world relationship between distinct entities.
29Once the target numbers of near duplicates and close relaitions have been generated, generate the remaining records. These records should be unrelated to the seed record and contain substantial variety (e.g., covering all enum/binary values in the data schema).
31Do not include the seed record in the output object.
33It should be possible to reconstruct the input record from the generated text, but the text should not simply list the values of the input record. Ensure that all values are embedded in a narrative that is plausible and coherent. Values may be aggregated, but they must ALL be present individually in the generated text.
35The content of the generated records and fields should follow the guidance below, if provided.
37--Generation guidance--
39{generation_guidance}
41--Primary Record Array--
43{primary_record_array}
45--Seed Record--
47{seed_record}
49--Record Targets--
51{record_targets}
52"""
54text_generation_prompt = """
55You are a helpful assistant tasked with generating a text document consistent with the input record provided.
57You can fabricate any narrative that is consistent with the input record, but do not fabricate additional data values that are not present in the input text.
60--Generation guidance--
62{generation_guidance}
64--Input text--
66{input_text}
68"""