Metadata-Version: 2.4
Name: ParUtils
Version: 1.3.2
Summary: This package contains a bunch of Python utilities developed for Support, Test and Automation IT Engineers
Author-email: Paul ARNAUD <paularnaud2@gmail.com>
Project-URL: Homepage, https://github.com/paularnaud2/ParUtils
Project-URL: Issues, https://github.com/paularnaud2/ParUtils/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# ParUtils
 
This package is a light version of ParTools, requiring no third-party dependencies. It includes a reworked logging feature, as well as string, and file handling features.
You can mainly use them for:
 
- Logging information in a file
- Loading / saving text files
- Listing files
- Loading / saving csv files (lighter but more performant than built in csv module)
- Creating hash strings, random string, comparing strings with wildcard char
- Comparing files and lists
- Listing and removing duplicates from a list
 
## QuickStart
  
    pip install parutils
 
You can start by testing the logger with the following code:

    import parutils as u

    u.Logger()  # initializes a log file
    u.log('Hello World')  # logs something in the console and in the log file

You should then see something like this in the console:
 
    Log file initialised (c:\Dev\ParUtils\log\20221213_072132.txt)
    CWD: c:\Dev\ParUtils
    Python interpreter path: C:\Python\python.exe
    Python version: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
    ParUtils version: 1.0.8

    07:21:32 - Hello World

## Example of useful functions
 
ParUtils provides generic functions meant to be reused by external packages. In this section, a few of these functions are listed. For an exhaustive list, you can check out  __parutils/\_\_init\_\_.py__.

Manipulating files
- `save_list`: saves a list into a text file
- `load_txt`: loads a text file into a string or a list
- `load_csv`: loads a csv file into a list of lists
- `save_csv`: saves a list of lists into a csv file
- `list_files`: lists files in a given directory (possibility to recurse subdirectories)

Manipulating strings
- `like`: behaves as the LIKE of Oracle SQL (you can match strings with wildcard character '\*'). It returns a re.match object giving you access to the matched wildcards strings.  
Example: ``m = like('Hello World', 'He\*o w\*d')``, ``m.group(1)`` => 'll'
- `like_list` / `like_dict`: apply the `like` function directly to lists or dictionaries (see doc).
- `big_number`: converts a potentially big number into a readable string.  
Example: ``big_number(10000000)`` => '10 000 000'.
- `get_duration_string`: outputs a string representing the time elapsed since the input ``start_time``.  
Example: ``get_duration_string(0, end_time=200)`` => '3 minutes and 20 seconds'.
- `hash512`: creates a non randomised hash string from a string.
- `gen_random_string`: generates a random string.

Data quality features:
- `diff_list`: compares two lists
- `file_match`: compare two files
- `find_dup_list`: finds duplicates in a list
- `del_dup_list`: removes duplicates from a list

 
## Logging with parutils
 
### Basic usage guide
The `log` function and the `Logger` class are directly available from the parutils package. So you can do:

    import parutils as u

    u.Logger()
    u.log('Hello World')

Note: if you want the `log` function to actually write in a log file, you have to create a ``Logger`` object before using it, otherwise it will just print out the log info in the console.

The relevant parameters such a the log directory or the log format can be specified when initializing the ``Logger`` object. The default ``log_format`` is ``'%H:%M:%S -'``, and a default log line looks like:
 
    19:45:04 - This line has been generated by the parutils.log function


Note that the default constants for the logging sub package are stored in __parutils.logging.const__. So for example, if you want to overwrite the default value for the logging directory, you can do:

    import parutils as u

    u.logging.const.DEFAULT_DIR = '<my_custom_dir>'
 
### About the step_log function
The `step_log` function allows you to log some information only when the input ``counter`` is a multiple of the input ``step``. Thus, `step_log` is to be used in loops to __track the progress of long processes__ such as reading or writing millions of lines in a file. The ``what`` input expects a description of what is being counted. It's default value is ``'lines written'``.  
In order to correctly measure the elapsed time for the first log line, the ``step_log`` function has to be initialised by running ``init_sl_timer()``.  
So for example, if you input ``step=500`` and don't input any ``what`` value, you should get something like this:
 
    19:45:04 - 500 lines written in 3 ms. 500 lines written in total.
    19:45:04 - 500 lines written in 2 ms. 1 000 lines written in total.
    19:45:04 - 500 lines written in 2 ms. 1 500 lines written in total.
 
Checkout the __test_logging.py__ file in __tests/logging__ for simple examples of use.
 
 ### About the retry mechanism
 If the logger is initialized, the log information is written in a file each time you call the log function. This file writing can fail, especially if your log file is located on a network drive.

 Parutils has a built-in retry mechanism related to log file writing failures. Here is how it works:  
 Whenever writing some log fails, a warning is printed, looking like this

    Warning: the following message couldn't be logged because of <error>: <logged message>
This warning is stored in a buffer and the logger will try to log this buffer to the log file next time it has something to log. If the next try also fails, another warning is printed and added to the buffer, etc, up to a certain limit. When the size of the buffer exceeds 10, the logger stops trying to log into a file, and just prints the logs in the currently available console.

### About the log_every argument
Because each logged message implies opening a file, writing to this file, and closing the file, logging can severely affect performance, especially when logging to some network locations, or more generally to a slow drive. The `log_every` argument of the Logger constructor allows to mitigate this by logging to the file only every <log_every> log messages, using a buffer mechanism. By default, log_every is set to one, so the log file is writing for every log messages. For example, if you set log_every to 10, the log file is only written every 10 log entries, while every log entry will still be printed live on the console.
