BostonBusMap developer blog

Hey everyone! This blog is meant to show people what developing this app is like for me. I will frequently get into details, so it would probably be helpful to have some background in programming when reading this blog. You don't have to read this to use the app; hopefully it's simple enough so that people can play around with it and figure everything out. If you have any comments or questions, email me at bostonbusmap@gmail.com

Thursday, June 14

Suffix arrays on Android

I’ve been doing a lot of work on the app recently, trying to get it back to feature parity with the old database-driven version. A lot of code is now generated with a python script, mostly to populate data. It’s proven easier to manipulate this way than it was in the database.

I managed to get suffix arrays working on my app! Suffix arrays are efficient data structures which store all the suffixes of strings you are searching. There are roughly 400,000 suffixes in every transit stop title string in the system, so efficiency is important. There is a list of indexes for each suffix which are in alphabetical order. This is an int array, so it should take around 1.6 megabytes of memory when in use. (There is also a list of each whole string, but those are just references and the length of this array is small in comparison.)

Sorting those indexes took a long time because I used a compressed list with log(n) lookup. I got around this by doing the sorting on my laptop and hardcoding the results in autogenerated code. The performance is pretty quick now, but the amount of autogenerated code has nearly doubled the size of the app. It’s now roughly 2.4 mb as an APK and nearly 6 mb on disk (although this will go to the SD card for users with Android 2.2 or later phones.)

Thursday, June 7

Suffix arrays and autogenerated code

I’ve been playing around with getting rid of most of the database and the routeconfig XML and replacing it with autogenerated code.

Pros:

  • It would simplify things and result in a performance increase, at least the first time they use the app. 
  • It allows me to do compile time computation and optimization, which makes other data structures (like RTrees) possible
  • Easier to write tests
  • No more obscure database corruption errors
Cons:
  • I would lose the ability to update routes from new information. However, this isn’t really being done now except to download route paths, which were kept separate due to space concerns.
  • Java is picky about too much data in one class or function, so I’ll need to be creative about breaking them up
I’m about 80% done with an experimental build with this feature!
On a related note, I wrote a suffix array implementation for use in my app: https://github.com/bostonbusmap/compressed-generalized-suffix-array
Its main difference with existing implementations is that it works efficiently with objects that have some string property you want to index. Creating new objects for every suffix uses a lot of memory, even when it’s just a wrapper class (in the case of String.substring(start, end)). This implementation only uses integer arrays and an ArrayList to store your already existing objects.

Sunday, November 13

Keeping route info updated

Life got really busy so I haven’t had much time to work on the app recently. Anyway, I wrote a script to help keep the app up-to-date. Specifically, the data from the routeconfig command changes from time to time for various reasons, and there currently doesn’t seem to be any easy way to know when the data has changed without downloading all of it. These changes can prevent the app from working correctly. For instance, different direction tags can cause directions to fail to display correctly, and the transit authority may remove or add stops from their routes. There are also two versions for Los Angeles and Toronto now, and because I don’t use those apps regularly it’s hard to know when something goes wrong.

A little while ago I took a course in parallel computing, and as part of that course I got some EC2 time for free. EC2 is pretty nice, and their lowest tier option is free, at least for a limited period of time. After a long period of experimentation I wrote a script, using cron, to download all the routeconfig data for a particular agency several times a day (spacing each route some seconds apart to avoid taxing their servers), filter out the route paths, concatenate into a large text file, and then gzip. A MD5 checksum is done to see if there was any difference from the last time it downloaded the data, and if it is different it uses the email program nail to send me the file as an attachment. Here’s the script:

backup_folder=backup_`date +%s` branches='sf-muni master toronto la' mkdir $backup_folder rm -f latest_backup ln -s $backup_folder latest_backup for branch in `echo $branches`; do pushd bostonbusmap pushd bostonBusMap git checkout $branch pushd tools rm route* ./build-routeconfig.sh mkdir ~/$backup_folder/$branch cp -f routes ~/latest_backup/$branch cp -f routeList ~/latest_backup/$branch cp -f routeconfig_full.xml.gz ~/latest_backup/$branch popd popd popd cp ~/latest_backup/$branch/routeconfig_full.xml.gz ~/latest_routeconfig.xml.gz cp ~/previous_backup/$branch/routeconfig_full.xml.gz ~/prev_routeconfig.xml.gz gzip -f -d ~/latest_routeconfig.xml.gz gzip -f -d ~/prev_routeconfig.xml.gz current="`md5sum ~/latest_routeconfig.xml | awk '{print $1}'`" prev="`md5sum ~/prev_routeconfig.xml | awk '{print $1}'`" gzip ~/latest_routeconfig.xml echo "current md5sum is $current" >> out.mail echo "previous md5sum is $prev" >> out.mail if [ "$current" != "$prev" ]; then echo "See attachment. $branch needs fixing" | /usr/bin/nail -s "DIFFERENCE FOUND in $branch" -a ~/latest_routeconfig.xml.gz my-email-address@gmail.com fi done rm previous_backup mv latest_backup previous_backup

and here’s the crontab:

30 2 * * * /usr/bin/screen -dmS routeconfig ~/create_backup.sh 
30 10 * * * /usr/bin/screen -dmS routeconfig ~/create_backup.sh 
30 18 * * * /usr/bin/screen -dmS routeconfig ~/create_backup.sh

It’s pretty simple but it gets the job done

Wednesday, June 1

Performance debugging

I discovered what was causing the long hiccups in my app for Android 1.6 users. I was instantiating time zone and time format objects every time the predictions for the current stop change. This happens often because the seconds or minutes on the popups should count down whenever a redraw happens. It took me a long time to find that. I tried tracking allocations but it only shows the last 512 allocations and the stack trace is truncated to the closest 8 levels, which obscured that this was happening in my code.

To reproduce the problem, I ran the emulator with the debugger attached, and I hit the pause button around halfway through, give or take. The first time I stopped, the stack trace stopped in the TimeZone constructor. It was just a call to

new TimeZone("America/New_York")

which I simply replaced with a function that cached the object. The second time I stopped at this point

android.text.format.DateFormat.getTimeFormat(context)

which I replaced with a slightly more complicated function to cache this value. Overall, the stop-the-debugger-randomly idea got results a lot quicker than looking at heap dumps and allocation traces. This isn’t as much of a problem in later Android releases because the garbage collection process is smoother (and most modern Android phones are significantly faster than older ones.)

Tuesday, May 31

Floating point accuracy

There was a pretty bad error in my app that was basically caused because I replaced this code:

double dist = (((1.0 - Math.cos(lat1 - lat2)) * 0.5) + Math.cos(lat1) * Math.cos(lat2) * ((1.0 - Math.cos(lon1 - lon2)) * 0.5)); return (float) (dist * radiusOfEarthInMiles);


with this:

float dist = (((1.0f - FloatMath.cos(lat1 - lat2)) * 0.5f) + FloatMath.cos(lat1) * FloatMath.cos(lat2) * ((1.0f - FloatMath.cos(lon1 - lon2)) * 0.5f)); return dist * (float)radiusOfEarthInMiles;

Basically, the stop locations are sorted by proximity to the center of the screen at the time, and I thought I could shave some time off by using floats instead of doubles. The Android docs say that on at least one platform, FloatMath functions can run in 33% of the time of regular trigonometric functions (source: http://developer.android.com/reference/android/util/FloatMath.html)
The calculations became wildly inaccurate in some cases, which prevented some people from getting updates for certain stops. Only 15 or so closest stops to the center of the screen are updated when the app polls the server for new information, due to bandwidth constraints mostly. Since the distance algorithm was inaccurate, some stops were placed outside that top 15 and some stops weren’t updated correctly.
I took a numerical computation class in college; I should know better. I didn’t appreciate how something that seems fuzzy like distance calculations can impact usability

Saturday, May 21

Documentation

I bought a doughnut and they gave me a receipt for the doughtnut… I don’t need a receipt for the doughnut. I give you money and you give me the doughnut, end of transaction. We don’t need to bring ink and paper into this. I can’t imagine a scenario that I would have to prove that I bought a doughnut. To some skeptical friend, ‘Don’t even act like I didn’t get that doughnut, I’ve got the documentation right here… It’s in my file at home. …Under “D”.’ - Mitch Hedberg

I am forever grateful that the Android docs are both well written and kept up to date. Compare the Android performance guidelines:

http://developer.android.com/guide/practices/design/performance.html

to the BlackBerry ones:

http://docs.blackberry.com/en/developers/deliverables/5716/BP_Writing_efficient_code_446999_11.jsp

In particular I like these quotes from the Android guidelines:

Previous versions of this document made various misleading claims. We address some of them here.

Every claim made in this document is backed up by a benchmark. The source to these benchmarks can be found in the code.google.com “dalvik” project.

The Android performance guidelines were updated when Android 2.2 shipped with a JIT dalvik engine. There were also corrections made based on incorrect assumptions in an earlier draft of the document. The fact that these guidelines are kept up to date makes my work easier and more fun. 

On an unrelated note, I recently found out about the Hierarchy Viewer: http://developer.android.com/guide/developing/debugging/debugging-ui.html

It’s pretty awesome. You can use it on a running app to see all the widget outlines, all the widget object information, the exact image that each widget composes, and the view hierarchy in general. 

Sunday, May 15

Monday, March 28

Startup times

I was going to write a long post, complete with figures, but to summarize, I haven’t gotten anywhere. The app needs to initialize a database with the description of the bus and subway systems, and it can take a couple minutes on a slow phone. It only happens once so it shouldn’t be a big deal, but it’s a sensitive operation and two minutes is a long time to have the app screw something up, by shutting down or crashing or etc.

Currently the data is stored as a gzipped xml file containing all the routeConfig info for the bus system. There’s a script in the tools directory that downloads and assembles this.

Things I’ve tried:

  • I tried storing XML in a couple other ways: in res/xml (slow, and file sizes need to be pretty small) and in res/raw uncompressed (file size limitation again).
  • I tried the XmlPullParser instead of SAX. No noticeable improvement
  • I tried using a script to convert the XML to SQL statements, which would then be used directly. I think there’s a significant speedup here, although it’s hard to tell.
  • Compressed SQL statements vs uncompressed didn’t seem to matter. I guess gzip is not the bottleneck
  • Precompiled statements didn’t seem to affect it much either. It also seems to get slower the more I insert into the database. However there don’t seem to be any constraints as I’m adding the data (I’m applying them afterwards). Neither did putting everything inside a single transaction

My Nexus One seems to have a significant speedup after these optimizations, the simulator less so. I’m not sure why that is. In any case the Nexus One is fast enough already, it’s the slower phones that would be affected most by improvements.

The obvious thing left to try is creating the sqlite database on my laptop and shipping it with the apk. I would still need to copy the data onto the device’s memory, although this would just give me more flexibility and wouldn’t necessarily be a solid limitation. I could just hard code everything, generating a Java file with a script, but it would make it harder to update the app with new information in the future.

    Tuesday, February 22

    Plans

    It’s been a long time since the last post. I’m going to keep this blog post short and just make it a list of things I would like to do with the app (even though I might not get to it all):

    • OpenStreetMaps (http://www.openstreetmap.org/)? It might be fun, although the color scheme is a little unpleasant in comparison. Maybe I can offer a way for people to compare the two? If OpenStreetMaps exposes a directions API, I’m sold
    • Fix bugs: the app can be put in an unstable state if you start it and quit midway through the update. It should at least detect this and try again.
    • Fix bugs: the times should not update when you’re looking at the “More info” view
    • Some kind of instructions for beginners
    • Filtering in More Info screen
    • Honeycomb tablet support

    My life has gotten busy lately so the pace of updates has slowed down. I still enjoy working on this though. I appreciate all the work Nextbus and MassDOT have put into their predictions feed, and it pays off every extra minute I get back in my life

    Friday, November 26

    Sorting by location

    “Nothing makes the earth seem so spacious as to have friends at a distance; they make the latitudes and longitudes.” - Henry David Thoreau

    I’m trying to polish things up and think about what new things I can do with it. Looking up single stops by tag or title means I can do search suggestions on those parameters. One thing which would be great is the ability to sort by distance, to find the 15 closest stops to you for instance. You can do this with MYSQL but SQLite apparently lacks the necessary trig functions to make this easy. I found a few stack overflow questions about this:

    http://stackoverflow.com/questions/3695224/android-sqlite-getting-nearest-locations-with-latitude-and-longitude

    http://stackoverflow.com/questions/3126830/query-to-get-records-based-on-radius-in-sqlite

    http://stackoverflow.com/questions/2034348/android-sqlite-sort-on-calculated-column-co-ordinates-distance

    One mentioned turning the lat and lon into a geohash, which is an interesting idea. The computed hash number has the property where geographically closer locations have closer hashes. However, the wikipedia article says that geohashes define a bounding box and locations outside this box would need to be recomputed. Link: http://en.wikipedia.org/wiki/Geohash

    I’ll probably keep things the way they are for now, which means getting everything from the database and sorting with Java. This is only inefficient if I sort by distance from a location every stop. (Currently there’s only the option to look at all stops of a certain route or all that are favorited.)

    A while back when I started working on the app I was trying to figure out how to do this in Java to get the 20 or so closest buses to the center of the map. My CEO at SoftArtisans told me about the great circle distance. I think this is the link he sent me: http://www.movable-type.co.uk/scripts/gis-faq-5.1.html

    There’s another formula I found out about later which is a little more efficient but the result of the formula was only useful in comparisons (ie, the result is not a distance but the larger the result, the farther the distance). It’s on the boost.geometry blog: http://barendgehrels.blogspot.com/2010/07/efficient-distances.html

    One immediate benefit is that I can now easily find out how many stops exist at a particular point. This will finally make favorites reliable. Previously, clicking the star would favorite the stop at the location, and it would search the other routes mentioned by the stop to see if there were stops that also existed at that exact location. That’s not quite all the stops at a location, and therefore you only see certain bus routes for a stop that you favorited.