Coverage for intelligence_toolkit/tests/unit/AI/test_text_splitter.py: 100%
16 statements
« prev ^ index » next coverage.py v7.10.7, created at 2025-10-16 13:41 -0300
« prev ^ index » next coverage.py v7.10.7, created at 2025-10-16 13:41 -0300
1# Copyright (c) 2024 Microsoft Corporation. All rights reserved.
2# Licensed under the MIT license. See LICENSE file in the project.
3#
4import pytest
6from intelligence_toolkit.AI.text_splitter import TextSplitter
9@pytest.fixture()
10def text_splitter():
11 return TextSplitter(20)
14@pytest.fixture()
15def text_example():
16 return """
17 The Intelligence toolkit aims to detect and explain patterns, relationships, and risks in data provided by the user. It is not designed to make decisions or take actions based on these findings.
18 The statistical "insights" that it detects may not be insightful or useful in practice, and will inherit any biases, errors, or omissions present in the data collecting/generating process. These may be further amplified by the AI interpretations and reports generated by the toolkit.
19 The generative AI model may itself introduce additional statistical or societal biases, or fabricate information not present in its grounding data, as a consequence of its training and design.
20 Users should be experts in their domain, familiar with the data, and both able and willing to evaluate the quality of the insights and AI interpretations before taking action.
21 The system was designed and tested for the processing of English language data and the creation of English language outputs. Performance in other languages may vary and should be assessed by someone who is both an expert on the data and a native speaker of that language.
22"""
25def test_text_splitter(text_splitter, text_example):
26 chunks = text_splitter.split(text_example)
28 expected_chunks = [
29 "The Intelligence toolkit aims to detect and explain patterns, relationships, and risks in data provided by the",
30 "user. It is not designed to make decisions or take actions based on these findings.",
31 'The statistical "insights" that it detects may not be insightful or useful in practice, and will',
32 "inherit any biases, errors, or omissions present in the data collecting/generating process. These may",
33 "be further amplified by the AI interpretations and reports generated by the toolkit.",
34 "The generative AI model may itself introduce additional statistical or societal biases, or fabricate information not present in",
35 "its grounding data, as a consequence of its training and design.",
36 "Users should be experts in their domain, familiar with the data, and both able and willing to evaluate",
37 "the quality of the insights and AI interpretations before taking action.",
38 "The system was designed and tested for the processing of English language data and the creation of English language",
39 "outputs. Performance in other languages may vary and should be assessed by someone who is both an expert on",
40 "the data and a native speaker of that language.",
41 ]
42 assert chunks == expected_chunks
45def test_text_splitter_empty_input(text_splitter):
46 text = ""
47 chunks = text_splitter.split(text)
48 assert chunks == []