TODO - this should be moved to MedCATtutorials if/when the release with the logging changes is released¶

MedCAT tutorial - logging with MedCAT¶

How MedCAT handles logging has changed somewhat as of recently. The idea is that MedCAT as a library attempts to interfere as little as possible with its users choice of what, how and where to log information.

The current startegy is 'opt in'. Which means that by default, all logging is disabled by MedCAT. However, we've added a shorthand for adding some handlers for console as well as medcat.log logging. And on top of that, it is pretty simple for the user to change the logging behaviour of different parts of the project separately. We will go over that in small examples below.

First of all, we want to import MedCAT and make sure that the version we're looking at includes the newer logging functionality.

In [40]:
# import medcat
import medcat
# we will use the below later
from medcat import config
from medcat import cat
import os
# print out version string
print(medcat.__version__)
# make sure there is a logger
if not hasattr(medcat, 'logger'):
    raise ValueError("This is an incompatible version!")
print("The package logger:", medcat.logger)

def reset_all_logger_handlers(log_file='temp_medcat.log'): # reset logger handlers in case a block is run multiple times
    medcat.logger.handlers = medcat.logger.handlers[:1] # include the default NullHandler
    config.logger.handlers = []
    cat.logger.handlers = []
    # remove temp log file if it exists
    if os.path.exists(log_file):
        os.remove(log_file)
/Users/martratas/Documents/CogStack/.MedCAT.nosync/MedCAT/venv/lib/python3.9/site-packages/medcat-1.3.1.dev51-py3.9.egg/medcat/cat.py:18: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  from tqdm.autonotebook import tqdm, trange
1.3.1.dev51
The package logger: <Logger medcat (WARNING)>

What we must now understand is that the logging library uses a hierarchical system for the loggers. That means that all the module-level loggers within MedCAT have the medcat.logger (which is the package-level logger) as their parent logger. So if we want to change the logging behaviour for the entire project, we can just interact with this one logger. However, if we want fine grained control, we can interact with each module-level logger separately.

The shorthand for logging¶

We have created a shorthand method to enable logging into the console as well as the medcat.log file. This is the medcat.add_default_log_handlers method. If you call it without any arguments, it will act on the package-level logger and use the above mentioned default file. However, the user can call this method with any other logger and/or target file name.

In [41]:
log_file = 'temp_medcat.log'
import os
# the default behaviour is to not log anything, the following should thus not create any output
medcat.logger.warning("This should be ignored")
print('Log file should not exists, does it?', os.path.exists(log_file))
# enable default loggging to the package-level logger
medcat.add_default_log_handlers(target_file=log_file)
# now we should have a console logger as well as a log in medcat.log
# so we should see the following output to both
msg = "This message should show up in console and be written to medcat.log"
medcat.logger.warning(msg)
with open(log_file, 'r') as f:
    last_line = f.readlines()[-1][:-1] # last line, ignoring the newline char
    print("Last line should be equal to our message", msg == last_line)
reset_all_logger_handlers(log_file) # for cleanup
This message should show up in console and be written to medcat.log
Log file should not exists, does it? False
Last line should be equal to our message True

The above example was trivial since we were acting on the logger ourselves. In production, this would happen as a side effect instead. But since the code is acting on the same instance, the result will be the same as well.

The other thing to note is the fact that the above example changes the package-level logger. That is, it will change the logging behaviour within the entire project. However, as mentioned above, one can do this for each module separately as well.

Every module that needs to log something will define a module level variable logger. When adjusting this logger, the change in behaviour will only affect that logger and thus only affect that module's output.

So we will now try to show that a little more precisely. In order to do that, we will use the logger attached to medcat.config.

In [42]:
import logging
config.logger.addHandler(logging.StreamHandler())
# now, the medcat.logger won't log into console
medcat.logger.error("This error does not show up")
# however, the config.logger will
config.logger.warning("This warning will show up")
# and at the same time, we can see that the logger of cat won't log anything either
cat.logger.warning("This warning will not show up either")
reset_all_logger_handlers() # for cleanup
This warning will show up

Adding a handler that logs into a file¶

Of course, one can also add a handler that logs things into a file. Just like we saw with the default handlers above. We can use this to have different modules in the project log to different files

In [43]:
target_file_config = 'medcat_config.log' # some target log file for config logger
target_file_cat = 'medcat_cat.log' # different log file for cat
# set up different file handlers for the two different modules
config.logger.addHandler(logging.FileHandler(target_file_config))
cat.logger.addHandler(logging.FileHandler(target_file_cat))
# config now logs into a different file than cat
# i.e the following gets logged into config's log file
config.logger.warning("There has been an issue with the config")
# and the following gets logged into cat's log file
cat.logger.error("There was a critical issue in CAT")
# we can check that by looking at the files
with open(target_file_config) as f:
    config_contents = f.read()
with open(target_file_cat) as f:
    cat_contents = f.read()
print('Config log contents:\n', config_contents)
print('CAT log contents:\n', cat_contents)
# cleanup
reset_all_logger_handlers(target_file_config)
reset_all_logger_handlers(target_file_cat)
Config log contents:
 There has been an issue with the config

CAT log contents:
 There was a critical issue in CAT

One can also modify other things within loggers for different modules. I.e adding filters or setting levels for different loggers. You can read more about these things at https://realpython.com/python-logging-source-code/.