How to read MDF files in Python¶
In this notebook, an example is given on how to read an MDF file using the Python mdf_reader package. Firstly, this is done by just reading all the data at once. Next, it is demonstrated how to make a choice of channels to read such that the data reading time can be reduced significantly.
But first, let's start with importing some modules and setting up the information logger.
import os
import funcy
import matplotlib.pyplot as plt
import seaborn as sns
import mdf_reader.mdf_parser as mdf
sns.set("notebook")
file_name = "data/AMS_BALDER_110225T233000_UTC222959.mdf"
if not os.path.exists(file_name):
# Dependent of from where we are running the notebook (inside the example directory or in the root),
# we need to add one directory level up
file_name = os.path.join("..", file_name)
We are ready to read the mdf date by just creating a MDFParser object with the file_name as input. We put the print_duration() around the reader only to show how long the reader needs to import all the data, but this is not required
print("Reading the mdf file {}".format(file_name))
# Add the Timer only to show how long the reader takes
with funcy.print_durations("MDFParser"):
mdf_obj = mdf.MDFParser(mdf_file=file_name)
print("Done")
mdf_obj.data.info()
Reading the mdf file ../data/AMS_BALDER_110225T233000_UTC222959.mdf 3.43 s in MDFParser Done <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 45000 entries, 2011-02-25 23:30:00 to 2011-02-25 23:59:59.960000 Columns: 115 entries, GPS_GGAQ to GPS_LongitudeDez dtypes: float64(94), int16(3), int64(11), uint16(6), uint64(1) memory usage: 37.5 MB
The reading of an MDF file takes quite some time due to the binary conversion which needs to take place under the hood. In this case the reading time was 3.52 s. After we are done reading, all the data is stored in a pandas DataFrame which can be referred to as mdf_obj.data
. We currently have loaded all 116 columns.
We can have a look at the contents of the columns by using the make_report() method. This will show a table to screen with five columns:
- A counter referring to the index of the table
- An index referring to the position of the channel in the DataFrame
- A flag Loaded to show if this channel has been loaded (now all set to true)
- The (unique) record name is also used to refer to a channel in the DataFrame.
- The label of the record giving a small description of the channel.
names = mdf_obj.make_report()
cnt index Loaded : Name : Label --- ----- ------ : -------------------------------------------------- : ------------------------------ 0 0 1 : GPS_GGAQ : Quality indicator 1 1 1 : GPS_GpsHour : Hour 2 2 1 : GPS_GpsMin : Minute 3 3 1 : GPS_GpsSec : Second 4 4 1 : GPS_Modeindicator : Mode indicator 5 5 1 : GPS_NoOfSats : Numer of satellites 6 6 1 : Huisman_ComCheckCounter : Com check counter 7 7 1 : Huisman_Spare1 : Spare1 8 8 1 : S_Day : Day 9 9 1 : S_Hour : Hour 10 10 1 : S_Minutes : Minutes 11 11 1 : S_Month : Month 12 12 1 : S_Seconds : Seconds 13 13 1 : S_Year : Year 14 14 1 : GPS_TrueCOG : True COG 15 15 1 : DateTime : 16 16 1 : S_FrameCounter : FrameCounter 17 17 1 : Huisman_Spare2 : Spare2 18 18 1 : DLS_ErrorCode : ErrorCode 19 19 1 : GPS_ErrorCode : ErrorCode 20 20 1 : GPS_WindAngle : Anemometer Wind Angle 21 21 1 : MRU_ErrorCode : ErrorCode 22 22 1 : Huisman_HotLoad : Hot load 23 23 1 : Huisman_LoaderAngle : Loader Angle 24 24 1 : Huisman_LoaderLoad : Loader load 25 25 1 : Huisman_TowerAngle : Tower angle 26 26 1 : Huisman_TravelingBlockLoad : Traveling lock load 27 27 1 : Huisman_TravelingBlockPosition : Traveling lock position 28 28 1 : BALDER__UI0_A101_VI1 : HZ3 - SG 01 29 29 1 : BALDER__UI0_A102_VI1 : HZ3 - SG 02 30 30 1 : BALDER__UI0_A103_VI1 : HZ3 - SG 03 31 31 1 : BALDER__UI0_A104_VI1 : HZ3 - SG 04 32 32 1 : BALDER__UI0_A105_VI1 : RX4 - SG 05 33 33 1 : BALDER__UI0_A106_VI1 : RX4 - SG 06 34 34 1 : BALDER__UI0_A107_VI1 : RX4 - SG 07 35 35 1 : BALDER__UI0_A108_VI1 : RX4 - SG 08 36 36 1 : BALDER__UI0_A109_VI1 : SL1 - SG 09 37 37 1 : BALDER__UI0_A110_VI1 : SL1 - SG 10 38 38 1 : BALDER__UI0_A111_VI1 : SL1 - SG 11 39 39 1 : BALDER__UI0_A112_VI1 : SL1 - SG 12 40 40 1 : BALDER__UI0_A113_VI1 : Q02 - SG 16 41 41 1 : BALDER__UI0_A114_VI1 : Q02 - SG 17 42 42 1 : BALDER__UI0_A115_VI1 : Q02 - SG 18 43 43 1 : BALDER__UI0_A116_VI1 : Q02 - SG 19 44 44 1 : BALDER__UI1_A117_VI1 : AD (SB) - SG 20 45 45 1 : BALDER__UI1_A118_VI1 : AD (SB) - SG 21 46 46 1 : BALDER__UI1_A119_VI1 : AD (SB) - SG 22 47 47 1 : BALDER__UI1_A120_VI1 : AD (SB) - SG 23 48 48 1 : BALDER__UI1_A121_VI1 : MAIN BRACE - SG 26 49 49 1 : BALDER__UI1_A122_VI1 : MAIN BRACE - SG 27 50 50 1 : BALDER__UI1_A123_VI1 : MAIN BRACE - SG 28 51 51 1 : BALDER__UI1_A124_VI1 : MAIN BRACE - SG 29 52 52 1 : BALDER__UI1_A125_VI1 : PIVot (SB) - SG 13 53 53 1 : BALDER__UI1_A126_VI1 : PIVot (SB) - SG 14 54 54 1 : BALDER__UI1_A127_VI1 : PIVot (SB) - SG 15 55 55 1 : BALDER__UI1_A128_VI1 : AD (PS) - SG 24 56 56 1 : BALDER__UI1_A129_VI1 : AD (PS) - SG 25 57 57 1 : BALDER__UI1_A130_VI1 : HZ3 - AX - 02 58 58 1 : BALDER__UI1_A131_VI1 : HZ3 - AY - 03 59 59 1 : BALDER__UI1_A132_VI1 : RX4 - AX - 01 60 60 1 : BALDER__UI2_A133_VI1 : COFF SB FWD - AX - 04 61 61 1 : BALDER__UI2_A134_VI1 : COFF SB FWD - AY - 05 62 62 1 : BALDER__UI2_A135_VI1 : COFF SB FWD - AZ- 06 63 63 1 : BALDER__UI2_A136_VI1 : MAIN BRACE - AX - 07 64 64 1 : BALDER__UI2_A137_VI1 : MAIN BRACE - AZ - 08 65 65 1 : BALDER__UI2_A138_VI1 : Tower Head (PS) - AX - 10 66 66 1 : BALDER__UI2_A139_VI1 : Tower Head (PS) - AY - 11 67 67 1 : BALDER__UI2_A140_VI1 : Tower Head (SB) - AY - 09 68 68 1 : BALDER__UI2_A141_VI1 : TSM - ACC 12 - AX 69 69 1 : BALDER__UI2_A142_VI1 : TSM - ACC 13 - AY 70 70 1 : BALDER__UI2_A143_VI1 : TSM - ACC 14 - AZ 71 71 1 : BALDER__UI2_A144_VI1 : SPARE 01 72 72 1 : DLS_Draught : Draught 73 73 1 : DLS_GoGL : GoGL 74 74 1 : DLS_GoGT : GoGT 75 75 1 : DLS_Heel : Heel 76 76 1 : DLS_kxx : kxx 77 77 1 : DLS_kyy : kyy 78 78 1 : DLS_kzz : kzz 79 79 1 : DLS_Lcg : LCG 80 80 1 : DLS_Mass : Mass 81 81 1 : DLS_PSBoomAng : PS_BoomAngle 82 82 1 : DLS_PSHookSpd : PS_HookSpeed 83 83 1 : DLS_PSJiAng : PS_JiAngle 84 84 1 : DLS_PSLoad : PS_Load 85 85 1 : DLS_PSLoadMom : PS_LoadMom 86 86 1 : DLS_PSOutreach : PS_Radius 87 87 1 : DLS_PSSelected : PS_HookID 88 88 1 : DLS_PSSideLead : PS_SideLead 89 89 1 : DLS_PSSlewAng : PS_SlewAngle 90 90 1 : DLS_PSSlewSpd : PS_SlewSpeed 91 91 1 : DLS_Realtime_simulation : DLS_Simul 92 92 1 : DLS_SBBoomAng : SB_BoomAngle 93 93 1 : DLS_SBHookSpd : SB_HookSpeed 94 94 1 : DLS_SBLoad : SB_Load 95 95 1 : DLS_SBLoadMom : SB_LoadMom 96 96 1 : DLS_SBOutreach : SB_Radius 97 97 1 : DLS_SBSelected : SB_HookID 98 98 1 : DLS_SBSideLead : SB_SideLead 99 99 1 : DLS_SBSlewAng : SB_SlewAngle 100 100 1 : DLS_SBSlewSpd : SB_SlewSpeed 101 101 1 : DLS_Tcg : TCG 102 102 1 : DLS_Trim : Trim 103 103 1 : DLS_Vcg : VCG 104 104 1 : GPS_Altitude : GPS_Altitude 105 105 1 : GPS_Currentheading : Gyro heading 106 106 1 : GPS_ROT : Gyro_ROT 107 107 1 : GPS_SOGkn : SOG kn 108 108 1 : GPS_WindSpeed : Anemometer Wind Speed 109 109 1 : MRU_Date1970 : Date1970 110 110 1 : MRU_Heave : Heave 111 111 1 : MRU_Roll : Roll 112 112 1 : GPS_GGAHDOP : GPS-HDOP 113 113 1 : MRU_Pitch : Pitch 114 114 1 : GPS_LatitudeDez : Latitude Decimal 115 115 1 : GPS_LongitudeDez : Longitude Decimal
As already mentioned, you can refer the dataframe as mdf_obj.data, and since this is a pandas DataFrame, we can use the pandas method describe() to give some statistical information on a channel. For instance, the MDF_Roll channel has the following statistics
mdf_obj.data["MRU_Roll"].describe()
count 45000.00000 mean 0.01136 std 0.00306 min 0.00254 25% 0.01049 50% 0.01219 75% 0.01341 max 0.01772 Name: MRU_Roll, dtype: float64
Use the values attribute to get the raw numpy array data. The first five values of the roll are for instance
mdf_obj.data["MRU_Roll"].values[:5]
array([0.01207, 0.01207, 0.01207, 0.01204, 0.01204])
These values could also be shown in the pandas way using the head method. This will plot the column data vs the index, which by default is the Date/Time. Here we demonstrate how to show the head of multiple columns. Note that you pass the column selection in a list, hence the double []
mdf_obj.data[["MRU_Roll", "MRU_Pitch", "MRU_Heave"]].head(5)
MRU_Roll | MRU_Pitch | MRU_Heave | |
---|---|---|---|
DateTime | |||
2011-02-25 23:30:00.000 | 0.01207 | -0.000187 | -0.1051 |
2011-02-25 23:30:00.040 | 0.01207 | -0.000187 | -0.1051 |
2011-02-25 23:30:00.080 | 0.01207 | -0.000187 | -0.1051 |
2011-02-25 23:30:00.120 | 0.01204 | -0.000259 | -0.1078 |
2011-02-25 23:30:00.160 | 0.01204 | -0.000259 | -0.1078 |
Speeding up the reading time¶
Quite ofter we are not interested in loading all the columns, but we just need a selection. The mdf_parser makes this easier by allowing to split up the reading of the MDF file: first the MDF header can be imported quickly, and then we make a choice of channels and import only the requirement. This will reduce the reading time significantly.
Selecting the columns based on a search string¶
Start with reading the header of the file
# first read the header dat only to get the column names
print("Reading header of {}".format(file_name))
with funcy.print_durations(f"MDFParser reading {file_name}"):
mdf_object = mdf.MDFParser(file_name, verbose=1, import_data=False)
print("Done reading the header")
Reading header of ../data/AMS_BALDER_110225T233000_UTC222959.mdf 12.74 ms in MDFParser reading ../data/AMS_BALDER_110225T233000_UTC222959.mdf Done reading the header
Since we had the import_data flag on False, the reading took only a few milliseconds. We can show the available data columns again by using the make_report() method just as we did above. The only difference is that the Loaded flag will be set to false
Now we know what columns are available we can make a selection by using the set_column_selection
method. Suppose we want to plot the Roll along the journey
mdf_object.set_column_selection(filter_list=["MRU_Roll"], include_date_time=True)
# now do the actual import
print("Importing the data from {}".format(file_name))
with funcy.print_durations("MDFParser"):
# mdf_object.import_data()
mdf_object.import_data()
print("Done")
mdf_obj.data.info()
names2 = mdf_object.make_report(show_loaded_data_only=True)
Importing the data from ../data/AMS_BALDER_110225T233000_UTC222959.mdf
cnt index Loaded : Name : Label --- ----- ------ : -------------------------------------------------- : ------------------------------ 0 15 1 : DateTime : 1 111 1 : MRU_Roll : Roll
83.71 ms in MDFParser Done <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 45000 entries, 2011-02-25 23:30:00 to 2011-02-25 23:59:59.960000 Columns: 115 entries, GPS_GGAQ to GPS_LongitudeDez dtypes: float64(94), int16(3), int64(11), uint16(6), uint64(1) memory usage: 37.5 MB
The include_date_time flag was added to automatically include the DateTime field if available. The DateTime will be assigned to the index of the DataFrame, so we can now plot the Roll vs date time
# plot one of the Roll vs. index
mdf_object.data.plot(y=["MRU_Roll"])
plt.ylabel('Roll [deg]')
plt.show()
The reading time has dropped to less than a second. The DateTime is implicitly imported (as we did not mention it in the filter list), so we can plot the Roll vs. the date
Selecting the columns based on the column names and labels¶
In the make_report() output it can be seen that each channel in the MDF file has a unique name which is not very self-explanatory (such as BALDER_UI1_A130_VI18) and a label with a description such as HZ3 - AX - 03 which is the HZ3 location with the acceleration in x direction. We can also filter on the label since the filter_list argument consists of a list of regular expressions. The first regular expression is always applied on the channel name, whereas all the next regular expressions are applied on the channel labels. So we can select all the HZ3 accelerations by doing
mdf_object.set_column_selection(filter_list=["^BALDER", "HZ3.*A[XYZ]"])
mdf_object.import_data()
names3 = mdf_object.make_report(show_loaded_data_only=True)
cnt index Loaded : Name : Label --- ----- ------ : -------------------------------------------------- : ------------------------------ 0 15 1 : DateTime : 1 57 1 : BALDER__UI1_A130_VI1 : HZ3 - AX - 02 2 58 1 : BALDER__UI1_A131_VI1 : HZ3 - AY - 03 3 111 1 : MRU_Roll : Roll
So in this case we have first selected all the channels with a record name starting with ^BALDER
(the ^
symbol matches the start of a string) and from that list we have selected all the channels with a label with HZ3.*A[XYZ]
. Note that the .*
matches a string of arbitrary length (also empty) with any character and the part with [XYZ]
matches the X
of Y
or Z
character. So using this filter list, we have now loaded the HZ3 AX and AY component (as AZ is not available). Note that the MRU_Roll component is still available from the previous round.
We have used the make_report() method again to show the contents of the mdf_object data. The show_loaded_data_only flags is used to suppress all the columns which are available in the MDF file but have not been loaded yet. The column names in the table shown by make_report are returned to the names list. We can use this list to access the data frame column names of the column we want to use. In this case, we first extract the names of the HZ3 AX and AY column, which are stored in the one and two index of this names list (as you can see from the cnt column in the table). Get the names and plot the data
hz3x_name = names3[1]
hz3y_name = names3[2]
ax = mdf_object.data.plot(y=[hz3x_name, hz3y_name])
# this is only required to replace the default labels (which are the column names) to
# something meaning full
lines, labels = ax.get_legend_handles_labels()
ax.legend(lines[:2], ["HZ3 - AX", "HZ3 - AY"], loc="best")
plt.show()
We can extend the filter_list with more labels search patterns if we want. For instance, if you want to load all the channels from the RX4 and the Tower you can do
mdf_object.set_column_selection(filter_list=["^BALDER", "Tower.*", "RX4"])
mdf_object.import_data()
mdf_object.make_report(show_loaded_data_only=True)
cnt index Loaded : Name : Label --- ----- ------ : -------------------------------------------------- : ------------------------------ 0 15 1 : DateTime : 1 32 1 : BALDER__UI0_A105_VI1 : RX4 - SG 05 2 33 1 : BALDER__UI0_A106_VI1 : RX4 - SG 06 3 34 1 : BALDER__UI0_A107_VI1 : RX4 - SG 07 4 35 1 : BALDER__UI0_A108_VI1 : RX4 - SG 08 5 57 1 : BALDER__UI1_A130_VI1 : HZ3 - AX - 02 6 58 1 : BALDER__UI1_A131_VI1 : HZ3 - AY - 03 7 59 1 : BALDER__UI1_A132_VI1 : RX4 - AX - 01 8 65 1 : BALDER__UI2_A138_VI1 : Tower Head (PS) - AX - 10 9 66 1 : BALDER__UI2_A139_VI1 : Tower Head (PS) - AY - 11 10 67 1 : BALDER__UI2_A140_VI1 : Tower Head (SB) - AY - 09 11 111 1 : MRU_Roll : Roll
['DateTime', 'BALDER__UI0_A105_VI1', 'BALDER__UI0_A106_VI1', 'BALDER__UI0_A107_VI1', 'BALDER__UI0_A108_VI1', 'BALDER__UI1_A130_VI1', 'BALDER__UI1_A131_VI1', 'BALDER__UI1_A132_VI1', 'BALDER__UI2_A138_VI1', 'BALDER__UI2_A139_VI1', 'BALDER__UI2_A140_VI1', 'MRU_Roll']
You can see that both the SB and PS Tower channels are loaded and also the RX4. The HZ3 was loaded in the previous call already. In this way, you can extend the channels being loaded.
Remember that the first pattern in the filter_list (^BALDER
) applies to the name of the record, all the following patterns apply to the labels. In case you want to select multiple columns based on the name field of the record you have to use the regular expression which |
, which means or
. For instance, load the latitude and longitude
mdf_object.set_column_selection(filter_list=["GPS_Lat.*|GPS_Lon.*"])
mdf_object.import_data()
mdf_object.make_report(show_loaded_data_only=True)
cnt index Loaded : Name : Label --- ----- ------ : -------------------------------------------------- : ------------------------------ 0 15 1 : DateTime : 1 32 1 : BALDER__UI0_A105_VI1 : RX4 - SG 05 2 33 1 : BALDER__UI0_A106_VI1 : RX4 - SG 06 3 34 1 : BALDER__UI0_A107_VI1 : RX4 - SG 07 4 35 1 : BALDER__UI0_A108_VI1 : RX4 - SG 08 5 57 1 : BALDER__UI1_A130_VI1 : HZ3 - AX - 02 6 58 1 : BALDER__UI1_A131_VI1 : HZ3 - AY - 03 7 59 1 : BALDER__UI1_A132_VI1 : RX4 - AX - 01 8 65 1 : BALDER__UI2_A138_VI1 : Tower Head (PS) - AX - 10 9 66 1 : BALDER__UI2_A139_VI1 : Tower Head (PS) - AY - 11 10 67 1 : BALDER__UI2_A140_VI1 : Tower Head (SB) - AY - 09 11 111 1 : MRU_Roll : Roll 12 114 1 : GPS_LatitudeDez : Latitude Decimal 13 115 1 : GPS_LongitudeDez : Longitude Decimal
['DateTime', 'BALDER__UI0_A105_VI1', 'BALDER__UI0_A106_VI1', 'BALDER__UI0_A107_VI1', 'BALDER__UI0_A108_VI1', 'BALDER__UI1_A130_VI1', 'BALDER__UI1_A131_VI1', 'BALDER__UI1_A132_VI1', 'BALDER__UI2_A138_VI1', 'BALDER__UI2_A139_VI1', 'BALDER__UI2_A140_VI1', 'MRU_Roll', 'GPS_LatitudeDez', 'GPS_LongitudeDez']
Now we have included the latitude and longitude column. We can display some information on the locations as follows
mdf_object.data[["GPS_LatitudeDez", "GPS_LongitudeDez"]].describe()
GPS_LatitudeDez | GPS_LongitudeDez | |
---|---|---|
count | 45000.000000 | 45000.000000 |
mean | -6.244657 | 10.729433 |
std | 0.000090 | 0.000139 |
min | -6.244780 | 10.729253 |
25% | -6.244736 | 10.729302 |
50% | -6.244671 | 10.729413 |
75% | -6.244571 | 10.729573 |
max | -6.244522 | 10.729637 |