Bash Recipes for Doing Science!
When prototyping programs that deal with lots of data on an Arduino and other embedded systems or even on full blown computers, it’s really useful to have a quick tool for plotting the output of the program. Initially, I used python for doing this. Python is a beautifully simple language and between Numpy, Scipy and Matplotlib, you can do pretty much anything you want with data; from doing simple plotting to running machine learning algorithms on the data. However, when all you want is to quickly plot a text file containing some data, breaking out a text editor to write a python script can get annoying especially if you do it many times a day.
That was when I came across this interesting video from the computerphile YouTube channel.
It’s a fantastic interview with Brian Kernighan where he talks about his work on Unix and in particular then awk
command line tool. I’ve been using linux as my primary operating system for quite some time now and I do a lot of programming in it. So I am fairly comfortable with using the terminal for compiling and debugging code. But there was never a situation where I had to sit down and learn about all the terminal based utilities that were available. As I was watching Brian Kernighan describe awk
and how it works, I started to realize that I could really use this tool to my advantage when playing around with data. That was when I decided take a closer look at all the tools that were available on the linux terminal to see what I was missing out on.
Over time I’ve developed a few “recipes” that are really useful to me and I thought I’d share some that I’ve found particularly useful.
1. Log and monitor a serial port
Logging and monitoring serial ports is a really common thing to have to do - especially if you’re working with embedded systems like the Arduino. It’s a standard way of getting data off the microcontroller. Checking what’s being logged to the serial port is as simple as running:
cat /dev/ttyUSB0
Sometimes the command exits immediately or shows garbled output (due to the wrong baud rate). You can use stty
to change the serial port settings.
stty -F /dev/ttyUSB0 115200 min 1
I can also log the data from the serial port conveniently to a text file,
cat /dev/ttyUSB0 > logfilename.txt
2. Log and monitor network ports
One of the ways to get data off an Arduino is using an XBee device to wirelessly transmit it. I’m a big fan of using the XBee WiFi module to log data to a UDP port on my laptop. To view incoming data from the UDP port, I use a really useful tool called netcat
. Say the device is logging data to port 9750. I can listen in on the data by running,
netcat -ul 9751
and log the data if I like.
netcat -ul 9751 > log.txt
3. Plot data in a log file from the terminal
feedgnuplot
is a really useful Perl script that can read data from stdin
and pass it to gnuplot
for plotting. The only requirement is that the data arrive in a specific format: One sample per line, spaces between each data stream. This means your data in the file should look like this:
1.0 1.5 2.3
1.1 1.3 2.7
2.6 5.9 3.3
To plot the data:
cat log.txt | feedgnuplot --lines --autolegend
The --autolegend
option automatically numbers each line in the graph. The --legend
option can be used to add custom legends. I highly recommend going through the
feedgnuplot manpage to find out about all the functionality that the script offers.
4. Plot only specific data
The awk
command is useful for filtering data that comes into the program line by line. A simple way it can be applied is to plot only specific columns in the text data.
cat log.txt | awk '{ print $1, $2 }' | feedgnuplot --lines --autolegend
The awk
command as shown here will filter out only columns 1 and 2 from the log file and pass it on to feedgnuplot
.
awk
can also be used to do more complicated things like select lines with only numbers or only text. This can be useful if the log file contains other debug output lines as well and you want to filter out and plot just the lines that contain numeric data.
Wikipedia has a pretty good introduction to awk
. I also found that a lot of the time, I could find what I needed for specific problems by searching stackoverflow.
5. Process data before plotting
What if I have some raw data in a log file that you want to run through some processing (more complicated than an awk
one liner)? I write a simple python script that reads lines from stdin
and writes the processed sample to stdout
. If I do this, I can plot the result by doing
cat log.txt | ./process_script.py | feedgnuplot --lines --autolegend
or log it to another file.
cat log.txt | ./process_script.py > processed_data.txt
6. Interpret packed binary data
If the data in the log file, or data that’s coming in from a serial port or network interface is in some packed binary format, there’s a handy tool called od
that can interpret it on the fly.
The command below will interpret data coming in as packets of 8 bytes as 2 byte integers.
netcat -ul 9751 | od -An -td2 -w8
od
is a versatile tool and as is usual for linux programs, I recommend reading the
manpage to know more about what it can do.
7. Process/Plot live data streams
If I have data coming in from from a serial port or from a network interface and I want to create a real-time plot, feedgnuplot
has an option for that.
cat /dev/ttyUSB0 | feedgnuplot --lines --autolegend --xlen 100 --stream 0.1
The --xlen
option plots a window of the last 100 samples and the --stream
option updates the plot as new data comes in. The parameter 0.1 is the refresh rate.
I can even run the live data stream through my processing algorithm before plotting.
netcat -ul 9750 | ./process_data.py | feedgnuplot --lines --autolegend --xlen 100 --stream 0.1
8. Redirecting to multiple programs.
Sometimes I’ve been in a situation where I want to monitor data coming in from a serial port and log it to a file at the same time. One way of doing this is to write a python script that reads data from stdin
, logs it to a file and also writes the same data to stdout
.
cat /dev/ttyUSB0 | ./log_and_print.py | feedgnuplot --lines --xlen 100 --stream 0.1
This is good if you only want to do one extra thing with the output. There is a better solution however that uses the tee
command.
cat /dev/ttyUSB0 | tee >(command1) >(command2) >(logfile.txt) | feedgnuplot --lines --autolegend --xlen 100 --stream 0.1
This technique is quite versatile and can be used in many ways. For example, I can use tee to get the raw data from the serial port, plot it, pass it through a data processing script and plot the output of that result as well for a comparison.
Summing up
I’m sure that there are many more clever ways to combine and compose these commands to make prototyping easier as well as commands that I don’t know about yet. If there is one thing that I’ve learned after using linux for a few years it’s that it often has modest looking command line tools that can do much more than a lot of GUI based applications if you spend just a little time to go through the manpage. I hope that this post inspires others to take a second look at the free tools that come with most linux distros. Some of them could really simplify your workflow!