How to read MDF files in Python¶

In this notebook, an example is given on how to read an MDF file using the Python mdf_reader package. Firstly, this is done by just reading all the data at once. Next, it is demonstrated how to make a choice of channels to read such that the data reading time can be reduced significantly.

But first, let's start with importing some modules and setting up the information logger.

In [2]:
import os
import funcy
import matplotlib.pyplot as plt
import seaborn as sns

import mdf_reader.mdf_parser as mdf

sns.set("notebook")

file_name = "data/AMS_BALDER_110225T233000_UTC222959.mdf"
if not os.path.exists(file_name):
    # Dependent of from where we are running the notebook (inside the example directory or in the root),
    # we need to add one directory level up
    file_name = os.path.join("..", file_name)

We are ready to read the mdf date by just creating a MDFParser object with the file_name as input. We put the print_duration() around the reader only to show how long the reader needs to import all the data, but this is not required

In [3]:
print("Reading the mdf file {}".format(file_name))
# Add the Timer only to show how long the reader takes
with funcy.print_durations("MDFParser"):
    mdf_obj = mdf.MDFParser(mdf_file=file_name)
print("Done")

mdf_obj.data.info()
Reading the mdf file ../data/AMS_BALDER_110225T233000_UTC222959.mdf
    3.43 s in MDFParser
Done
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 45000 entries, 2011-02-25 23:30:00 to 2011-02-25 23:59:59.960000
Columns: 115 entries, GPS_GGAQ to GPS_LongitudeDez
dtypes: float64(94), int16(3), int64(11), uint16(6), uint64(1)
memory usage: 37.5 MB

The reading of an MDF file takes quite some time due to the binary conversion which needs to take place under the hood. In this case the reading time was 3.52 s. After we are done reading, all the data is stored in a pandas DataFrame which can be referred to as mdf_obj.data. We currently have loaded all 116 columns.

We can have a look at the contents of the columns by using the make_report() method. This will show a table to screen with five columns:

  1. A counter referring to the index of the table
  2. An index referring to the position of the channel in the DataFrame
  3. A flag Loaded to show if this channel has been loaded (now all set to true)
  4. The (unique) record name is also used to refer to a channel in the DataFrame.
  5. The label of the record giving a small description of the channel.
In [4]:
names = mdf_obj.make_report()
cnt index Loaded : Name                                               : Label
--- ----- ------ : -------------------------------------------------- : ------------------------------
  0     0      1 : GPS_GGAQ                                           : Quality indicator
  1     1      1 : GPS_GpsHour                                        : Hour
  2     2      1 : GPS_GpsMin                                         : Minute
  3     3      1 : GPS_GpsSec                                         : Second
  4     4      1 : GPS_Modeindicator                                  : Mode indicator
  5     5      1 : GPS_NoOfSats                                       : Numer of satellites
  6     6      1 : Huisman_ComCheckCounter                            : Com check counter
  7     7      1 : Huisman_Spare1                                     : Spare1
  8     8      1 : S_Day                                              : Day
  9     9      1 : S_Hour                                             : Hour
 10    10      1 : S_Minutes                                          : Minutes
 11    11      1 : S_Month                                            : Month
 12    12      1 : S_Seconds                                          : Seconds
 13    13      1 : S_Year                                             : Year
 14    14      1 : GPS_TrueCOG                                        : True COG
 15    15      1 : DateTime                                           :
 16    16      1 : S_FrameCounter                                     : FrameCounter
 17    17      1 : Huisman_Spare2                                     : Spare2
 18    18      1 : DLS_ErrorCode                                      : ErrorCode
 19    19      1 : GPS_ErrorCode                                      : ErrorCode
 20    20      1 : GPS_WindAngle                                      : Anemometer Wind Angle
 21    21      1 : MRU_ErrorCode                                      : ErrorCode
 22    22      1 : Huisman_HotLoad                                    : Hot load
 23    23      1 : Huisman_LoaderAngle                                : Loader Angle
 24    24      1 : Huisman_LoaderLoad                                 : Loader load
 25    25      1 : Huisman_TowerAngle                                 : Tower angle
 26    26      1 : Huisman_TravelingBlockLoad                         : Traveling lock load
 27    27      1 : Huisman_TravelingBlockPosition                     : Traveling lock position
 28    28      1 : BALDER__UI0_A101_VI1                               : HZ3 - SG 01
 29    29      1 : BALDER__UI0_A102_VI1                               : HZ3 - SG 02
 30    30      1 : BALDER__UI0_A103_VI1                               : HZ3 - SG 03
 31    31      1 : BALDER__UI0_A104_VI1                               : HZ3 - SG 04
 32    32      1 : BALDER__UI0_A105_VI1                               : RX4 - SG 05
 33    33      1 : BALDER__UI0_A106_VI1                               : RX4 - SG 06
 34    34      1 : BALDER__UI0_A107_VI1                               : RX4 - SG 07
 35    35      1 : BALDER__UI0_A108_VI1                               : RX4 - SG 08
 36    36      1 : BALDER__UI0_A109_VI1                               : SL1 - SG 09
 37    37      1 : BALDER__UI0_A110_VI1                               : SL1 - SG 10
 38    38      1 : BALDER__UI0_A111_VI1                               : SL1 - SG 11
 39    39      1 : BALDER__UI0_A112_VI1                               : SL1 - SG 12
 40    40      1 : BALDER__UI0_A113_VI1                               : Q02 - SG 16
 41    41      1 : BALDER__UI0_A114_VI1                               : Q02 - SG 17
 42    42      1 : BALDER__UI0_A115_VI1                               : Q02 - SG 18
 43    43      1 : BALDER__UI0_A116_VI1                               : Q02 - SG 19
 44    44      1 : BALDER__UI1_A117_VI1                               : AD (SB) - SG 20
 45    45      1 : BALDER__UI1_A118_VI1                               : AD (SB) - SG 21
 46    46      1 : BALDER__UI1_A119_VI1                               : AD (SB) - SG 22
 47    47      1 : BALDER__UI1_A120_VI1                               : AD (SB) - SG 23
 48    48      1 : BALDER__UI1_A121_VI1                               : MAIN BRACE - SG 26
 49    49      1 : BALDER__UI1_A122_VI1                               : MAIN BRACE - SG 27
 50    50      1 : BALDER__UI1_A123_VI1                               : MAIN BRACE - SG 28
 51    51      1 : BALDER__UI1_A124_VI1                               : MAIN BRACE - SG 29
 52    52      1 : BALDER__UI1_A125_VI1                               : PIVot (SB) - SG 13
 53    53      1 : BALDER__UI1_A126_VI1                               : PIVot (SB) - SG 14
 54    54      1 : BALDER__UI1_A127_VI1                               : PIVot (SB) - SG 15
 55    55      1 : BALDER__UI1_A128_VI1                               : AD (PS) - SG 24
 56    56      1 : BALDER__UI1_A129_VI1                               : AD (PS) - SG 25
 57    57      1 : BALDER__UI1_A130_VI1                               : HZ3 - AX - 02
 58    58      1 : BALDER__UI1_A131_VI1                               : HZ3 - AY - 03
 59    59      1 : BALDER__UI1_A132_VI1                               : RX4 - AX - 01
 60    60      1 : BALDER__UI2_A133_VI1                               : COFF SB FWD - AX - 04
 61    61      1 : BALDER__UI2_A134_VI1                               : COFF SB FWD - AY - 05
 62    62      1 : BALDER__UI2_A135_VI1                               : COFF SB FWD - AZ- 06
 63    63      1 : BALDER__UI2_A136_VI1                               : MAIN BRACE - AX - 07
 64    64      1 : BALDER__UI2_A137_VI1                               : MAIN BRACE - AZ - 08
 65    65      1 : BALDER__UI2_A138_VI1                               : Tower Head (PS) - AX - 10
 66    66      1 : BALDER__UI2_A139_VI1                               : Tower Head (PS) - AY - 11
 67    67      1 : BALDER__UI2_A140_VI1                               : Tower Head (SB) - AY - 09
 68    68      1 : BALDER__UI2_A141_VI1                               : TSM - ACC 12 - AX
 69    69      1 : BALDER__UI2_A142_VI1                               : TSM - ACC 13 - AY
 70    70      1 : BALDER__UI2_A143_VI1                               : TSM - ACC 14 - AZ
 71    71      1 : BALDER__UI2_A144_VI1                               : SPARE 01
 72    72      1 : DLS_Draught                                        : Draught
 73    73      1 : DLS_GoGL                                           : GoGL
 74    74      1 : DLS_GoGT                                           : GoGT
 75    75      1 : DLS_Heel                                           : Heel
 76    76      1 : DLS_kxx                                            : kxx
 77    77      1 : DLS_kyy                                            : kyy
 78    78      1 : DLS_kzz                                            : kzz
 79    79      1 : DLS_Lcg                                            : LCG
 80    80      1 : DLS_Mass                                           : Mass
 81    81      1 : DLS_PSBoomAng                                      : PS_BoomAngle
 82    82      1 : DLS_PSHookSpd                                      : PS_HookSpeed
 83    83      1 : DLS_PSJiAng                                        : PS_JiAngle
 84    84      1 : DLS_PSLoad                                         : PS_Load
 85    85      1 : DLS_PSLoadMom                                      : PS_LoadMom
 86    86      1 : DLS_PSOutreach                                     : PS_Radius
 87    87      1 : DLS_PSSelected                                     : PS_HookID
 88    88      1 : DLS_PSSideLead                                     : PS_SideLead
 89    89      1 : DLS_PSSlewAng                                      : PS_SlewAngle
 90    90      1 : DLS_PSSlewSpd                                      : PS_SlewSpeed
 91    91      1 : DLS_Realtime_simulation                            : DLS_Simul
 92    92      1 : DLS_SBBoomAng                                      : SB_BoomAngle
 93    93      1 : DLS_SBHookSpd                                      : SB_HookSpeed
 94    94      1 : DLS_SBLoad                                         : SB_Load
 95    95      1 : DLS_SBLoadMom                                      : SB_LoadMom
 96    96      1 : DLS_SBOutreach                                     : SB_Radius
 97    97      1 : DLS_SBSelected                                     : SB_HookID
 98    98      1 : DLS_SBSideLead                                     : SB_SideLead
 99    99      1 : DLS_SBSlewAng                                      : SB_SlewAngle
100   100      1 : DLS_SBSlewSpd                                      : SB_SlewSpeed
101   101      1 : DLS_Tcg                                            : TCG
102   102      1 : DLS_Trim                                           : Trim
103   103      1 : DLS_Vcg                                            : VCG
104   104      1 : GPS_Altitude                                       : GPS_Altitude
105   105      1 : GPS_Currentheading                                 : Gyro heading
106   106      1 : GPS_ROT                                            : Gyro_ROT
107   107      1 : GPS_SOGkn                                          : SOG kn
108   108      1 : GPS_WindSpeed                                      : Anemometer Wind Speed
109   109      1 : MRU_Date1970                                       : Date1970
110   110      1 : MRU_Heave                                          : Heave
111   111      1 : MRU_Roll                                           : Roll
112   112      1 : GPS_GGAHDOP                                        : GPS-HDOP
113   113      1 : MRU_Pitch                                          : Pitch
114   114      1 : GPS_LatitudeDez                                    : Latitude Decimal
115   115      1 : GPS_LongitudeDez                                   : Longitude Decimal

As already mentioned, you can refer the dataframe as mdf_obj.data, and since this is a pandas DataFrame, we can use the pandas method describe() to give some statistical information on a channel. For instance, the MDF_Roll channel has the following statistics

In [5]:
mdf_obj.data["MRU_Roll"].describe()
Out[5]:
count    45000.00000
mean         0.01136
std          0.00306
min          0.00254
25%          0.01049
50%          0.01219
75%          0.01341
max          0.01772
Name: MRU_Roll, dtype: float64

Use the values attribute to get the raw numpy array data. The first five values of the roll are for instance

In [6]:
mdf_obj.data["MRU_Roll"].values[:5]
Out[6]:
array([0.01207, 0.01207, 0.01207, 0.01204, 0.01204])

These values could also be shown in the pandas way using the head method. This will plot the column data vs the index, which by default is the Date/Time. Here we demonstrate how to show the head of multiple columns. Note that you pass the column selection in a list, hence the double []

In [7]:
mdf_obj.data[["MRU_Roll", "MRU_Pitch", "MRU_Heave"]].head(5)
Out[7]:
MRU_Roll MRU_Pitch MRU_Heave
DateTime
2011-02-25 23:30:00.000 0.01207 -0.000187 -0.1051
2011-02-25 23:30:00.040 0.01207 -0.000187 -0.1051
2011-02-25 23:30:00.080 0.01207 -0.000187 -0.1051
2011-02-25 23:30:00.120 0.01204 -0.000259 -0.1078
2011-02-25 23:30:00.160 0.01204 -0.000259 -0.1078

Speeding up the reading time¶

Quite ofter we are not interested in loading all the columns, but we just need a selection. The mdf_parser makes this easier by allowing to split up the reading of the MDF file: first the MDF header can be imported quickly, and then we make a choice of channels and import only the requirement. This will reduce the reading time significantly.

Selecting the columns based on a search string¶

Start with reading the header of the file

In [8]:
# first read the header dat only to get the column names
print("Reading header of {}".format(file_name))
with funcy.print_durations(f"MDFParser reading {file_name}"):
    mdf_object = mdf.MDFParser(file_name, verbose=1, import_data=False)

print("Done reading the header")
Reading header of ../data/AMS_BALDER_110225T233000_UTC222959.mdf
   12.74 ms in MDFParser reading ../data/AMS_BALDER_110225T233000_UTC222959.mdf
Done reading the header

Since we had the import_data flag on False, the reading took only a few milliseconds. We can show the available data columns again by using the make_report() method just as we did above. The only difference is that the Loaded flag will be set to false

Now we know what columns are available we can make a selection by using the set_column_selection method. Suppose we want to plot the Roll along the journey

In [9]:
mdf_object.set_column_selection(filter_list=["MRU_Roll"], include_date_time=True)

# now do the actual import
print("Importing the data from {}".format(file_name))
with funcy.print_durations("MDFParser"):
    #    mdf_object.import_data()
    mdf_object.import_data()

print("Done")
mdf_obj.data.info()
names2 = mdf_object.make_report(show_loaded_data_only=True)
Importing the data from ../data/AMS_BALDER_110225T233000_UTC222959.mdf
cnt index Loaded : Name                                               : Label
--- ----- ------ : -------------------------------------------------- : ------------------------------
  0    15      1 : DateTime                                           :
  1   111      1 : MRU_Roll                                           : Roll
   83.71 ms in MDFParser
Done
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 45000 entries, 2011-02-25 23:30:00 to 2011-02-25 23:59:59.960000
Columns: 115 entries, GPS_GGAQ to GPS_LongitudeDez
dtypes: float64(94), int16(3), int64(11), uint16(6), uint64(1)
memory usage: 37.5 MB

The include_date_time flag was added to automatically include the DateTime field if available. The DateTime will be assigned to the index of the DataFrame, so we can now plot the Roll vs date time

In [10]:
# plot one of the Roll vs. index
mdf_object.data.plot(y=["MRU_Roll"])
plt.ylabel('Roll [deg]')
plt.show()
No description has been provided for this image

The reading time has dropped to less than a second. The DateTime is implicitly imported (as we did not mention it in the filter list), so we can plot the Roll vs. the date

Selecting the columns based on the column names and labels¶

In the make_report() output it can be seen that each channel in the MDF file has a unique name which is not very self-explanatory (such as BALDER_UI1_A130_VI18) and a label with a description such as HZ3 - AX - 03 which is the HZ3 location with the acceleration in x direction. We can also filter on the label since the filter_list argument consists of a list of regular expressions. The first regular expression is always applied on the channel name, whereas all the next regular expressions are applied on the channel labels. So we can select all the HZ3 accelerations by doing

In [11]:
mdf_object.set_column_selection(filter_list=["^BALDER", "HZ3.*A[XYZ]"])
mdf_object.import_data()
names3 = mdf_object.make_report(show_loaded_data_only=True)
cnt index Loaded : Name                                               : Label
--- ----- ------ : -------------------------------------------------- : ------------------------------
  0    15      1 : DateTime                                           :
  1    57      1 : BALDER__UI1_A130_VI1                               : HZ3 - AX - 02
  2    58      1 : BALDER__UI1_A131_VI1                               : HZ3 - AY - 03
  3   111      1 : MRU_Roll                                           : Roll

So in this case we have first selected all the channels with a record name starting with ^BALDER (the ^ symbol matches the start of a string) and from that list we have selected all the channels with a label with HZ3.*A[XYZ]. Note that the .* matches a string of arbitrary length (also empty) with any character and the part with [XYZ] matches the X of Y or Z character. So using this filter list, we have now loaded the HZ3 AX and AY component (as AZ is not available). Note that the MRU_Roll component is still available from the previous round.

We have used the make_report() method again to show the contents of the mdf_object data. The show_loaded_data_only flags is used to suppress all the columns which are available in the MDF file but have not been loaded yet. The column names in the table shown by make_report are returned to the names list. We can use this list to access the data frame column names of the column we want to use. In this case, we first extract the names of the HZ3 AX and AY column, which are stored in the one and two index of this names list (as you can see from the cnt column in the table). Get the names and plot the data

In [12]:
hz3x_name = names3[1]
hz3y_name = names3[2]
ax = mdf_object.data.plot(y=[hz3x_name, hz3y_name])

# this is only required to replace the default labels (which are the column names) to
# something meaning full
lines, labels = ax.get_legend_handles_labels()
ax.legend(lines[:2], ["HZ3 - AX", "HZ3 - AY"], loc="best")
plt.show()
No description has been provided for this image

We can extend the filter_list with more labels search patterns if we want. For instance, if you want to load all the channels from the RX4 and the Tower you can do

In [13]:
mdf_object.set_column_selection(filter_list=["^BALDER", "Tower.*", "RX4"])
mdf_object.import_data()
mdf_object.make_report(show_loaded_data_only=True)
cnt index Loaded : Name                                               : Label
--- ----- ------ : -------------------------------------------------- : ------------------------------
  0    15      1 : DateTime                                           :
  1    32      1 : BALDER__UI0_A105_VI1                               : RX4 - SG 05
  2    33      1 : BALDER__UI0_A106_VI1                               : RX4 - SG 06
  3    34      1 : BALDER__UI0_A107_VI1                               : RX4 - SG 07
  4    35      1 : BALDER__UI0_A108_VI1                               : RX4 - SG 08
  5    57      1 : BALDER__UI1_A130_VI1                               : HZ3 - AX - 02
  6    58      1 : BALDER__UI1_A131_VI1                               : HZ3 - AY - 03
  7    59      1 : BALDER__UI1_A132_VI1                               : RX4 - AX - 01
  8    65      1 : BALDER__UI2_A138_VI1                               : Tower Head (PS) - AX - 10
  9    66      1 : BALDER__UI2_A139_VI1                               : Tower Head (PS) - AY - 11
 10    67      1 : BALDER__UI2_A140_VI1                               : Tower Head (SB) - AY - 09
 11   111      1 : MRU_Roll                                           : Roll
Out[13]:
['DateTime',
 'BALDER__UI0_A105_VI1',
 'BALDER__UI0_A106_VI1',
 'BALDER__UI0_A107_VI1',
 'BALDER__UI0_A108_VI1',
 'BALDER__UI1_A130_VI1',
 'BALDER__UI1_A131_VI1',
 'BALDER__UI1_A132_VI1',
 'BALDER__UI2_A138_VI1',
 'BALDER__UI2_A139_VI1',
 'BALDER__UI2_A140_VI1',
 'MRU_Roll']

You can see that both the SB and PS Tower channels are loaded and also the RX4. The HZ3 was loaded in the previous call already. In this way, you can extend the channels being loaded.

Remember that the first pattern in the filter_list (^BALDER) applies to the name of the record, all the following patterns apply to the labels. In case you want to select multiple columns based on the name field of the record you have to use the regular expression which |, which means or. For instance, load the latitude and longitude

In [14]:
mdf_object.set_column_selection(filter_list=["GPS_Lat.*|GPS_Lon.*"])
mdf_object.import_data()
mdf_object.make_report(show_loaded_data_only=True)
cnt index Loaded : Name                                               : Label
--- ----- ------ : -------------------------------------------------- : ------------------------------
  0    15      1 : DateTime                                           :
  1    32      1 : BALDER__UI0_A105_VI1                               : RX4 - SG 05
  2    33      1 : BALDER__UI0_A106_VI1                               : RX4 - SG 06
  3    34      1 : BALDER__UI0_A107_VI1                               : RX4 - SG 07
  4    35      1 : BALDER__UI0_A108_VI1                               : RX4 - SG 08
  5    57      1 : BALDER__UI1_A130_VI1                               : HZ3 - AX - 02
  6    58      1 : BALDER__UI1_A131_VI1                               : HZ3 - AY - 03
  7    59      1 : BALDER__UI1_A132_VI1                               : RX4 - AX - 01
  8    65      1 : BALDER__UI2_A138_VI1                               : Tower Head (PS) - AX - 10
  9    66      1 : BALDER__UI2_A139_VI1                               : Tower Head (PS) - AY - 11
 10    67      1 : BALDER__UI2_A140_VI1                               : Tower Head (SB) - AY - 09
 11   111      1 : MRU_Roll                                           : Roll
 12   114      1 : GPS_LatitudeDez                                    : Latitude Decimal
 13   115      1 : GPS_LongitudeDez                                   : Longitude Decimal
Out[14]:
['DateTime',
 'BALDER__UI0_A105_VI1',
 'BALDER__UI0_A106_VI1',
 'BALDER__UI0_A107_VI1',
 'BALDER__UI0_A108_VI1',
 'BALDER__UI1_A130_VI1',
 'BALDER__UI1_A131_VI1',
 'BALDER__UI1_A132_VI1',
 'BALDER__UI2_A138_VI1',
 'BALDER__UI2_A139_VI1',
 'BALDER__UI2_A140_VI1',
 'MRU_Roll',
 'GPS_LatitudeDez',
 'GPS_LongitudeDez']

Now we have included the latitude and longitude column. We can display some information on the locations as follows

In [15]:
mdf_object.data[["GPS_LatitudeDez", "GPS_LongitudeDez"]].describe()
Out[15]:
GPS_LatitudeDez GPS_LongitudeDez
count 45000.000000 45000.000000
mean -6.244657 10.729433
std 0.000090 0.000139
min -6.244780 10.729253
25% -6.244736 10.729302
50% -6.244671 10.729413
75% -6.244571 10.729573
max -6.244522 10.729637
In [ ]: