Migrating the Manchester Trams API to Cloudflare
November 11, 2024
Early last week, an idea came to me to explore the available Public Transport APIs I'd discovered through working on the Trams and Buses apps again. Specifically, there's an endpoint that returns scheduled departures for Bus Stops around the UK, and remembering that Tram Stops are treated as Bus Stops in the NAPTAN database, I wanted to see if calling that endpoint with a Tram Stop identifier would return scheduled Tram times.
Turns out, it does!
And so this article details how and why a new endpoint lead to me migrating the existing Manchester Trams API from a VPS to Cloudflare.
Why migrate?
Simply put, the current API is cumbersome to work on. It's an Adonis API running on a cheap VPS with a MySQL database, and whilst it's stood the test of time so far, applying updates to the API has been a tedious process which consists of:
- Wait for minimal usage on the app,
- Pull the latest commit,
- Build the app
- Restart it with PM2.
- Testing with live app
(This was before I learnt about Docker which would make this a lot easier, but alas)
These updates would usually occur late at night when the Trams start to wind down.
There's a few other smaller issues too, such as the VPS being outside the UK, and the want to expand my small Public Transport empire in the future which the current API isn't capable of.
Setting up Cloudflare
For this, I'd be using Workers, D1, KV and tying them together with Hono. The first step was getting the database created and tables migrated.
As Wrangler has a built-in command to apply migrations to a database, all I had to do was create a migrations folder and a stations.sql file.
CREATE TABLE stations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
slug TEXT NOT NULL,
address TEXT NOT NULL,
postcode TEXT NOT NULL,
longitude REAL NOT NULL,
latitude REAL NOT NULL,
facilities TEXT NOT NULL DEFAULT '{"accessibility": [], "Car parking": [], "Cycling": [], "Other facilities": [], "Tickets": []}',
zones TEXT NOT NULL DEFAULT '[]',
lines TEXT NOT NULL DEFAULT '[]',
platforms TEXT NOT NULL DEFAULT '[]'
);
The existing tables have some funky columns to them, so I took the opportunity during the migration to clean up. For example, the arrays within "facilities" were all separate JSON columns and "zones" were three separate columns (zone, is_edge_station, edge_zone).
I also created another table for users, with usual columns like email, username, password. This is so in the future, I can support giving API access to a few users who've asked for it before.
Migrating existing data
Next, it was onto the Cloudflare Worker where I setup a new Hono application and endpoints for inserting, updating and reading from the Stations table.
Then it was onto migrating existing data, I started by saving the JSON response from the existing /stations endpoint and writing a script to transform it into the new layout. Mainly, this consisted of building the facilities arrays and sorting out zones. After 20 minutes or so, I had all 99 stations in the new Database.
However...
During earlier testing of the Scheduled Departures endpoint with a random Platform ID I had laying around, I noticed the ID only correlated to one platform at the actual Station. This made sense but meant that my current system, which retrieves departures via a Station ID, wouldn't work as stations don't store any platform data.
Essentially I had to go hunt down all 99 Stations worth of Platform data in the NAPTAN database; a 434,000 row CSV file from the Department of Transports website.
This is where the real fun started, how exactly do I find 99 Stations worth of Platforms in this huge file?
Well, assuming each station has a minimum of 2 Platforms (some have 3 or 4), that meant I was looking for roughly 200 - 300 rows of data. Just like finding a needle in a haystack, but here's where the real trick came in.
The Platform identifier I'd been using so far started with 9400ZZMA, whilst I don't know what the beginning characters relate to, the last two "MA" meant Manchester, and so I could quickly select nearly 300 records from the CSV and copy into a new file.
Importing new Platform data
After some clean up of non-existent platforms, I was left with roughly 210 rows ready to be processed into the new D1 database.
Each row consisted of Station name, Platform ID, and an Indicator (usually "To" or "From" a location like the city centre). I wrote a quick script that processed this CSV into API calls. Each call would use a slug-ified version of the station name to find the correct record in the DB and pass a body containing the new Platforms array.
{
"platforms": [
{
"atcocode": "9400ZZMA...",
"indicator": "To Manchester",
"direction": "inbound"
},
{
"atcocode": "9400ZZMA...",
"indicator": "From Manchester",
"direction": "outbound"
}
]
}
The direction property was determined by if the indicator value contained "To" or "From". This value then displays in the existing app above each set of times on a Station page.
Final tweaks and tests
After the migration of Stations was done, it was time to do some testing and tweaking. First step was to add some Authentication using a middleware on each request.
Initially, I tested checking a token against the User's table before allowing a request to fetch stations, but each request was a consistent 1.2 seconds (too long).
So I pivoted to JWTs using Hono's JWT middleware which was 50% faster at 0.6 seconds. However, a token using this method would work if I wanted someone to have access for just a day or two, but wouldn't do for my current apps' requirements.
Finally, I tested storing API Tokens in Cloudflare's KV storage, which with response times between 0.7 - 0.9ms, felt like a nice compromise between the two previous solutions. Additionally, I added caching to all the endpoints except live departures (for obvious reasons), and that spend response times up to about 0.2 - 0.3 seconds, a worthy time!
It's worth noting here, the current VPS is hosted in another European country and has response times around the 0.3 - 0.5 so there's marginally better response times from Cloudflare but nothing to write home about.
Conclusion
In all, this migration took a few evenings to complete as I wanted to check all the data was correct. There's still a bit of work left to do on the mobile apps to support these changes, but with a much simply system to work with on Cloudflare, I can now begin iterate on my ideas to democratise public transport data at a much faster rate!
Stay tuned as I have a lot in store, and thanks for reading! 🧙♂️