software

Raspberry Pi Weather Display

While going through some old project components, I found a cute little case for a Raspberry Pi and a TFT screen. Instead of allowing it to collect more dust, I decided to try to make something useful with it. The small size was perfect for some sort of informational display, so I decided to turn it into a weather display to keep by the door to remind me to take a coat or umbrella.

Finished Raspberry Pi weather display!

The display case was for a Raspberry Pi Model B (version 1!) and a 2.8″ TFT screen. I was able to find an old Model B and got started.

The first step was to get the PiTFT screen running as I had no idea if it even worked. I first started by installing Raspbian Bullseye on the Raspberry Pi, but was unable to get anything to display on the screen. After digging in a bit more (and reading the manual), I found that these screens can have issues with Bullseye, but often work on Raspbian Buster. I tried again with a fresh Raspbian Buster install but still had problems with the display not showing the desktop (but the console worked as expected). I was finally able to get the screen to work by installing Adafruit’s recommended lite distribution and then installing the PIXEL Desktop on it.

I then used the Adafruit Easy Install instructions to set up HDMI mirroring between the Raspberry Pi and an external monitor. It’s a good idea to make any last configurations that require the higher resolution of the monitor before running the easy install script as the HDMI mirroring mode downscales the monitor to 640×480 resolution. This includes disabling any screensavers that could interfere with the display.

Once I had the desktop environment running, I tried out a few Linux desktop apps to see if they would work for my display. Sadly, most of the apps were designed for higher resolution screens which made them difficult to read on the TFT screen. GNOME Weather was almost good enough, but its lack of an auto-refresh feature made it infeasible for my project.

Close but no cigar: GNOME Weather on a Raspberry Pi

My next option was to build my own weather display application. I decided to use the OpenWeatherMap API as their free version had all of the data I needed and their free subscription tier had enough request quota for my purposes. I also wanted a set of icons for my display and found the open source weather-icon project, which contains icons for almost any weather condition imaginable (including aliens!)

Once I had the data for the project, I started investigating how to build a graphical user interface for the display. After a false start with Python Tkinter, I decided to use Pygame. This was my first time using Pygame (or any Python GUI toolkit for that matter) but it was relatively easy to make progress with it. Although this framework is tailored towards building games, I found it to be quite effective for building the GUI for this project. After a bit of tinkering, I was able to build a customizable weather application for small displays. The code is available here.

I then copied my code over to the Raspberry Pi and was able to see the screen in action! I made a few small display tweaks and then configured autostart to run the display program on startup.

Raspberry Pi weather display by the door

And that’s it! I now have a neat little weather display by my door and I was finally able to use some parts that I bought eight years ago!

Pedestrian Safety in Manhattan

For the final project in my Realtime and Big Data Analytics class at NYU, I worked on an analysis of the effectiveness of pedestrian safety measures in Manhattan with fellow students Rui Shen and Fei Guan. The main idea behind this project was to look at the number of accidents occurring within a fixed distance of an intersection in Manhattan and determine if the accident rate correlated with any features of the intersection, such as the presence of traffic signals or high traffic volume. We used a number of big data tools and techniques (like Apache Hadoop and MapReduce) to analyze this data and found some rather interesting results.

The first step was to collect data about intersections, accidents, and various features of the intersections. To do this, we relied heavily on open source data sets. We extracted the locations of intersections, speed bumps, and traffic signals from OpenStreetMap. We used NYC Department of Transportation data for traffic volume information, traffic signal locations, and traffic camera locations. Finally, we used NYC Open Data for information on accident counts and traffic volume, as well as the locations of speed bumps, arterial slow zones, and neighborhood slow zones. Some of the data could be used mostly off of the shelf, but other datasets required further processing, such as normalizing traffic volume over time and geocoding the street addresses of traffic camera locations.

The next step was to merge the feature and accident data with the relevant intersections. To do this, we used big data tools to assign intersection identifiers to every corresponding feature and accident record. As Hadoop can’t natively handle spatial data, we needed some additional tools to help us determine which features existed within an intersection. There were three distinct types of spatial data that we needed to process: point data (such as accidents), line data (such as traffic volume) and polygon data (such as neighborhood slow zones). Fortunately, GIS Tools for Hadoop helped us solved this problem. The GIS Tools implement many spatial operations on top of Hadoop, such as finding spatial geometry intersections, overlaps, and inclusions. This toolkit also includes User Defined Functions (UDFs) which can be used with Hive. For this task, we used Hive and the UDFs to associate the feature and accident data with the appropriate intersections. We experimented with different sizes of spatial buffers around an intersection and decided that a twenty-meter radius captured most of the related data points without overlapping with other intersections.

Examples of the different types of spatial data we had to correlate with intersections: area data (blue), point data (red) and line data (green).
Examples of the different types of spatial data that could exist within an intersection: area data (blue), point data (purple) and line data (green).

Once all of the relevant data had an intersection identifier assigned to it, we wrote a MapReduce job to aggregate all of the distinct data sets into one dataset that had all of the intersection feature information in a single record. In the reduce stage, we examined all of the data for a given intersection and did some further reduction, such as normalizing the traffic volume value for the intersection or calculating the sum of all of the accidents occurring within the intersection buffer.

The last step was to calculate correlation metrics on the data. To do this, we used Apache Spark. We segmented the data set into thirds by traffic volume, giving us low, moderate, and high traffic volume data sets.  We then calculated Spearman and Pearson correlation coefficients between the accident rate and the individual features and then analyzed the results. Although most features showed very little correlation with the accident rate, there were a few features that produced a moderate level of correlation. First, we found that there is a moderate positive correlation between accidents and the presence of traffic lights. This seemed odd at first but on second consideration it made sense. I have seen many random acts of bravery occur at traffic signals where people would try to cross the street just as the light was changing. Second, we found that there was a moderate negative correlation between high traffic volume and accidents. Again, this was not immediately intuitive, but our speculation was that drivers and pedestrians would be more cautious at busy intersections.

As this project was only a few weeks long, we didn’t have time to do a more in-depth analysis. I think we would have found even more interesting results had we done a better multivariate analysis which would allow us to calculate correlation metrics across all variables instead of just examining single variant correlation. One observation that we made was that intersections in high-traffic business or tourist areas have different accident profiles than intersections in residential areas. Therefore, it would be wise to include more socio-economic information for each intersection, such as land-use information and population information.

Despite the time constraints, the small amount of analysis we did was very interesting and made me look at something as simple as crossing the street in a whole new light.

Super Mario Clouds

Super Mario Clouds

I took the Super Mario Clouds class this weekend at NYC Resistor. It was taught by Jonathan Dahan and David Huerta. The goal of the class was to recreate Cory Arcangel’s Super Mario Clouds project. This involved hacking a Super Mario Brothers cartridge to show just the clouds. Here is a video of his finished project:

The first step was to take apart a Super Mario Brothers cartridge. Interestingly enough, the actual board inside was much smaller than the cartridge. The game was divided into two chips: character/sprite data on the left chip and program data on the right chip. Since we were going to use the existing Super Mario sprite data, we only needed to remove the program chip. After carefully desoldering the program chip, we replaced it with a socket and a 27C256 EEPROM.

Disassembled SMB

Next, We talked about how a Nintendo worked. There were some very helpful explanations in the Nerdy Nights Tutorials. The Nintendo uses a custom 6502 processor which has the audio processing built in to the chip. There is a separate picture processing unit used to display graphics. The program ROM is limited to a mere 32 KB, so Nintendo had to do some clever graphics manipulation in order to create a rather seamless side-scrolling experience.

IMG_9491

The next step was to download Cory Arcangel’s Super Mario Clouds code. There was quite a bit of software setup required to run  and compile the code from scratch. As a word of warning: some of these tools work best in Windows. As I am a Mac user, I used a Windows virtual machine on VirtualBox. To run the code, we used the FCEUX Nintendo emulator. Here is a screenshot of the original code as rendered by the emulator:

clouds-3

But what if we wanted to modify the code? The original Super Mario Clouds code was written in nbasic. To create a new binary, we first had to compile the code with nbasic and then convert it to 6502 assembly with nesasm. Those tools can be found here.  Finally, an NES splitter is needed to split the resulting .nes file into .chr and .prg files (for the respective character and program chips).

Once we had the .prg file, it was time to burn our programs to the chip. This was done using an EEPROM programmer. We simply selected the chip type, uploaded our binary and let the programmer software do the rest.

Programmer

After uploading our programs to the cartridge, it was time for the moment of truth. We plugged our cartridges into an old Nintendo and hoped for the best. Fortunately, most of the cartridges worked on the first try!

Nintendo hack

This project was a great way to spend an afternoon with my head in the (Super Mario) clouds. Now I really want to create my own Super Mario piece. I never thought I would be so excited to write assembly code…

Probabilistic Postal Address Elementalization

I haven’t posted in a few months because most of my time has been consumed with work and school. With respect to school, I have been taking Statistical Natural Language Processing at New York University. As my final project for this class, I have been working on something which I have been curious about for quite some time – probabilistic postal address elementalization.

What exactly does “postal address elementalization” mean? This is the process of breaking a postal address into tokens and classifying the function of each token, such as house number, street suffix or zip code. For my experiment, I created twelve distinct classes: five for administrative areas (country, state, county, city and neighborhood), one for postal codes and six for street address components (house number, prefix, pre-directional, base street name, post-directional and suffix). An example of an elementalized address is below:

address_elementalization

Although this seems like a straightforward problem, it is complicated by the fact that many countries have different languages and address formats. For example, in Ghana and Cameroon, there is no standard postal code system. In the Netherlands and Ireland, there is no province or state in the address. In some countries, such as France and Mexico, the postal code is placed before the city whereas other countries place it after the city. Furthermore, known names of terms in specific classes can overlap, so disambiguating streets can be tricky, like Avenue N in Brooklyn (“N” could be a post-directional or street name) or N Broadway in St. Louis (“St.” could be a suffix or part of a city name).

Traditionally, address elementalization has been implemented by building rule-based systems to handle each individual address format. This makes the process of building international address parsers very time consuming as one would need to implement a new rule-based parser for every single country. By building a statistical model to do this, implementing a new international format is as simple as training the model with a new data set.

Some interesting characteristics of postal addresses are that they have a grammar (as defined by the address format) and the elements are contextually dependent. These traits make the problem well-suited for natural language processing. There are natural language processing techniques that are used for similar purposes, namely part-of-speech taggers which are used to classify the parts of speech in a sentence.

For my final project, I looked at four different techniques of statistical part-of-speech tagging and applied them to the problem of postal address elementalization. The code is here.

The first strategy was a Hidden Markov model (HMM) tagger. HMMs are statistical models that can be used to find the most likely sequence of states for an input. This is done by using transition probabilities (the probability of a specific state given the previous state) and emission probabilities (the probability of the proposed state given the current input token). These probabilities are learned by observing a training set.  The model then tries to classify the input left to right, calculating the probability of a proposed state as the product of the transmission probability, emission probability, and maximum probability from the previous state. The model tries different state combinations over the input and returns the sequence of possible states with the highest overall probability.  A variation of this is the trigram HMM tagger, which uses the two previous probabilities to calculate the transition probability.

The second strategy was a Maximum-Entropy Markov model (MEMM) tagger. MEMMs are similar to HMMs in that they try to find the sequence of states that has the maximum total probability for an input. However, instead of just using observed counts for the transition and emission probabilities, we train a maximum-entropy distribution to calculate the probabilities. The main advantage of doing this is that we can have custom features factor into the probability, such as the current token length or whether the token contains a number. This allows for greater flexibility in tuning the tagger, but it makes the overall classification time slower.

The third strategy was a Transformation-Based Learning (TBL) tagger, also known as a Brill tagger. The TBL tagger generates rules based on observations in training. These rules indicate observed conditions on when a tag should be swapped with a different tag. During classification, initial tags are assigned to the terms based on the observed probability. The tagger iterates through the rules which were learned in training, swapping tags as necessary, until there are no more rules to be applied or a given score threshold is met.

The fourth strategy was a Conditional Random Field (CRF) tagger. CRFs implement a number of feature functions which take the proposed states and the input observation and return some value between 0 and 1. These features, like the features in MEMMs, can be just about anything, such as whether the current token is capitalized or whether it ends in -ed. Each of these feature functions has a different weight based on observations in training. The score for a sequence of states is calculated by summing the feature functions for every word over all words and then normalizing. Just like HMMs and MEMMs, we find the sequence that maximizes the overall probability.

From my initial work, I found that the Maximum-Entropy Markov model and the Conditional Random Field taggers consistently had the highest overall accuracy of the group. Both consistently had accuracies over 98%, even on partial addresses. For full format American addresses, these taggers had an accuracy around 99.7%. If I had to choose between them, I would go with the Conditional Random Field tagger as it was considerably faster in tagging the addresses.

I didn’t have enough time to finish everything that I wanted to implement, so this is still a work in progress. I still want to implement some smoothing techniques for unknown states, such as Katz backoff and Kneser-Ney interpolation. There are also a few more part-of-speech tagging techniques I would like to experiment with. I guess that old adage is true in this case – time flies when you are having fun…  🙂