Getting Started¶
Using Python and other open-source tools are great only if one gets acquainted with the ecosystem sufficiently. Otherwise, usually an online search frenzy and then the frustration is around the corner.
Hence, before we move to the harold documentation, maybe a general information about the related concepts can soften the fall.
Python and its scientific stack¶
For the newcomer, a slightly distorted story of the tools required to work with harold might save some headaches. To emphasize, the story is roughly correct but precisely informative !
Since the late 70s, there has been an enormous effort went into the numerical computing. For reasons that are really not relevant now, Fortran (yes that old weirdo language) is still considered to be one of the fastest if not the fastest runtime performance platform. On top of this performance, people with great numerical algebra expertise had been optimizing these implementations since then.
The result is now known as BLAS, LAPACK and other key libraries optimized to the bone with almost literally manipulating the memory addresses manually. And what happens is that many commercial software suites actually somehow find a way to utilize this performance even though they are not coding in Fortran directly. You have to appreciate how generous the authors were and why I’m humbly replicating their style with the MIT license.
Hence, without knowingly, you have been mostly using compiled Fortran code with various front ends including matlab and other software. However, this high performance doesn’t come for free. The price to pay is to be extremely verse with low-level operations to benefit from these tools. Hence, every language/software somehow designs a front end that communicates with the user, understands the context, say a matrix turns out to be triangular, then prepares the data in a very strict format and sends to these libraries. In turn, picks up the result, shuffles the output format and converts it to something that the user can utilize further.
In Python, this front end is called the scientific stack, that is NumPy and SciPy. These involve C and Fortran bindings to low-level tools and wrap them with the typical easy-to-use Python syntax.
That’s why, if you are not the faint-hearted user and installing everything by yourself, you have to provide a compiler that is capable of providing the compiled versions of these libraries. You might have noticed that some sources ship precompiled flavors of NumPy, SciPy to ease the pain of building the libraries from scratch. While being very useful, if the compiler does not like your particular system, the result is often a glorious crash. Hence, it is more of an art rather than following a manual
Note
The compiler problem is almost-universally a Windows problem since it doesn’t come with a proper compiler because microsoft being microsoft. The unfortunate users (including me) who have no idea even what a compiler is, have to find a compiler that crashes depending on the weather conditions of the installation date.
For this reason, some groups have come up with precompiled and matched version packages that are ready to be installed without bothering the users such as Anaconda, pythonxy etc. See the installation page of Scientific stack and cross your fingers.
Note
Python is also mostly written in C hence the story is more involved but for the math oriented users it is not relevant at this point.
Python and its strange syntax¶
Python did not start its life as a programming language involving a mathematics
module. It actually took quite some time to reach to the current situation that
is arguably dominated the data science and machine learning fields. Consequently,
most of the common english keyboard characters are reserved for other programming
languages. Even worse, most of the brackets and delimiters are used for internal
structures. For example, a matlab native square brackets [...]
are used to
create objects that go by the name lists. Hence, the scientific stack creators
had to come up with more verbose and kind of annoying syntax compared to the
native syntax of the well-known scientific computing software suites. As an
example, the moment you open up a Python console, you need to import the NumPy
module via import numpy
and then obtain the possibility of array manipulations,
math functions:
a = numpy.array([1,2,3,4])
b = numpy.atleast_2d(a)
c = a.reshape(4,1)
d = numpy.array([[1],[2],[3],[4]]) # creates a tall vector
Note
The situation is slightly more complicated than this because what first above command creates is a one dimensional array. That is to say, matlab users will be baffled when they try to transpose it as it will spit out exactly the same array but not the tall vector. The reason for this is that the array is theoretically have only one dimension hence transpose is not well-defined as there is no second dimension to swap the elements. There are of course ways to handle this but none is fun.
Note
There is also a limited math library in Python and that is not to be confused with the scientific library of Python.
It was actually 2014 that Python developers finally(!!) decided to reserve
a symbol for matrix multiplication that is the @
character (again almost
everything else is reserved and core Python developers don’t care that much
about math stuff and the regular *
is meant for element-wise multiplication
). The actual matrix multiplication is actually a function
with a surprisingly accurate name : a.dot(b)
. However, this becomes über-tedious
if a chain of matrices are multiplied, especially when we don’t have the
amazing (I mean it) backslash operator of matlab via A\B*(C+D*E)*F
, e.g.,
x = numpy.linalg.solve(A,B).dot(C+D.dot(E)).dot(F)
Warning
Never, ever, never use inv()
in your computations on any software
(unless you want to see what the inverse is explicitly). You have been warned
with a colored box.
But in turn, probably, you get the most sought after operation: continuous slicing and functional chains (I don’t know what nerds call this). What I mean: imagine you want to slice a matrix then do something with another slice of another matrix and index the result .... and so on.
m = A[:4,:3].dot(B[2:5,:])[3,2].dot(C)
This example can be arbitrarily long and complicated so don’t get fooled by counter arguments: there is no substitute for this and once you get the hang of it, it pays off. For some reason, NumPy/SciPy/IPython prefer the mathematica syntax for linear algebra objects and I have no idea why. Arguably, Mathematica is the least matrix algebra friendly software suite but anyway. There is no decision to be made here unless we recode the array parsing parts all from scratch.
The indexing of the arrays start from zero and not one !! This debate is stupid and I don’t care. I recommend you to do so. Computer scientists are not the best people to design front-ends. You wouldn’t ask the owner of the Facebook to meet your best friends now, do you? It would be a guaranteed disaster.
Moreover, the array indexing is semi-exclusive, in math notation whatever is given is used as [a,b[. In english, this notation [1,5] is meant to be read as from one up to five, excluding five. It sometimes makes things extremely natural and convenient, sometimes you might consider watching the paint dry instead of debugging the mismatched array sizes.
Another point that might annoy the users is the complex number syntax. In Python,
you are obliged to use the letter j
for the complex unit but cannot i
.
Hopefully, this would convince you that Python framework is not yet another matlab clone. I cannot do justice to demonstrate all the nuances since not only I don’t have the resources but also some of the stuff still don’t make too much sense to me. But in defense of Python, most of the stuff in matlab never made sense to me either. So it is a major upgrade.
An almost exhaustive cheat sheet for recovering matlab users¶
The following link is actually one of the first hits on any search engine but here it is for completeness. Please have some time spared to check out the differences between numpy and matlab syntax. It might even teach you some matlab too.
Click here for “Numpy for matlab users”
Now assuming that you have mastered the art of finding your way through gazillion of blogs, filtering StackOverflow nerd anger, decrypting the documentation of Numpy, let’s start doing some familiar things in harold.
Todo
Finish this as soon as possible
A humble advice to Python beginners¶
Since Python is a programming language and not a front-end software, the code we write usually needs to be executed somehow. Without going into the details, Python code needs to be interpreted (similar to matlab) as opposed to compiling C code.
The native way of doing this is simply typing python
on the command
line and making it a Python interpreter. However, this is hardly ever
useful for any practical purposes, let alone resuming or reproducing
previous work. Having said that, you don’t need a scary Visual blabla
suite that looks like an airplane cockpit either. There are many options
to make your life easier.
- The first option is simply working with an editor, e.g., Spyder (I had a very positive experience with it), Eclipse, PyCharm, vim, emacs so on.
- Using the recent and very very powerful Jupyter (previously known
as IPython) which converts your web browser into a mathematica like environment with explicit cells but you can embed even Youtube videos. Moreover, it also works on the command window too which is not limited to Python, but as the name implies Julia, Python, R and so on.
I would strongly recommend Jupyter notebook option. It also makes sharing your work with others extremely easy. Please follow the link to Jupyter and install accordingly to your liking.
Initializing harold¶
Once you have managed to make Jupyter work you will have to import harold as a library. And when you do you have to access the function names properly depending on how you imported harold.
This point is a pretty confusing and a source for heated arguments but I’ll
just cut to the chase. Almost all proper programming languages involve the
concept of namespaces (you guess correctly, matlab doesn’t have this).
Sticking to the part that is relevant for us, this concept actually makes
it possible to attach all the function names to a particular name family
which is represented by the dot notation e.g.,
mypackage.myfunction.myattribute = 45
etc.
One obvious reason for this is that separate name families avoid name clashes which is a nightmare in matlab if you have two folders on the path and both have the variants of the same function. You can never be sure which one matlab is going to read if you decide to call this function from somewhere else.
Long story short when you import harold you can simply write on top of your notebook
import harold
then it is possible to access the functions with the harold.
prefix
such as, say, for frequency response calculations:
harold.frequency_response(G)
Alternatively, you can use an abbreviation for the package name
import harold as har
and then you can access the functions with har.
prefix. Lastly,
there is another way which is, as is for almost everything involving
professional programmers, another battlefield. You can basically decide
to skip the namespace and import all functions with their original name
to the parent namespace:
from harold import *
This will scan the harold library and import every object whose name
doesn’t start with _
or __
. For interactive notebooks, this
is pretty convenient if you are not importing libraries that have
similar function names (in turn, you can never be sure).
Conclusion, if you don’t have any worries about name clashes use the last syntax. The typical first cell of the notebook is the importing declarations. Here is the boilerplate code to start with:
import numpy as np
import scipy as sp
from harold import *
You can of course extend this to your liking with your own packages. Finally, let’s do some control stuff