Testing the urban legend that Ryan Air fakes their 'on time' data

Testing the urban legend that Ryan Air fakes their on time data.

The Ryan Air urban legend

‘Ryan Air aren’t really on time the most - they schedule their flights to take longer’

I’ve heard comments like the quote above many times before - yet I’ve never seen any real evidence. I’m tempted to believe it, as Ryan Air is no stranger to crude/attention grabbing methods. Alluding to charging to use the the toilet, standing seats and scamming people with scratchie cards are all topics they’ve covered before. As flight data is pretty easy to get your hands on, I figured it would be a fun side project to see if I could find some data to explore the myth myself.

The data

To test out the “Ryan Air” theory I set up a python script that would call the flightaware.com API to pull down data on the last weeks 72 hours worth of flights, then I left the script running as a CRON job for several months.

Overall I pulled flight data on 22 routes where there was Ryan Air and one of five other airlines flying the exact same route. Unfortunately I didn’t find any routes flown by Ryan Air and a non-budget airline, so this comparison is limited to Ryan Air vs it’s contemporaries. Across the 6 airlines (Ryan Air plus 5 others), I had full data on scheduled and actual flight times for 3,421 flights. The python code was actually really simple - and an example that will pull data on one flight is below.

‘Actual time’ and ‘Scheduled time’ are two variables I am relying on, without having been able to 100% nail down what they mean (and it is rather important here). From scanning the API documentation, I believe that when I say ‘actual’ and ‘scheduled time’, both variables are referring to when the plane leaves or touches down on the runway (not the time the plane is scheduled to leave or arrive at the gate).

#!/usr/bin/python
  import sys
  from suds import null, WebFault
  from suds.client import Client
  import logging
  import time

# change me
  username = 'myusername'
  apiKey = 'myapikey'
  url = 'http://flightxml.flightaware.com/soap/FlightXML2/wsdl'

# connect
  logging.basicConfig(level=logging.INFO)
  api = Client(url, username=username, password=apiKey)
  print api

# today
  day = time.strftime("%Y%m%d")

# pull a flight
  flight = 'EXS898' # flight number
  filename = flight+'_'+str(day) # name the file with flight num and today's date
  file=open(str(filename), 'w+')
  result = api.service.FlightInfoEx(flight,10, 0)
  file.write(str(result))

Ryan Air vs. Easy Jet

The main airline I found that shared routes with Ryan Air was Easy Jet, with 6 shared routes (3 return journeys) and a total of 1,675 flights in my dataset. Below is a figure showing the distribution of actual-scheduled flight times. From this plot it seems that, at least on these six routes, both Easy Jet and Ryan Air have flight times that are consistently shorter than the scheduled flight times.

Looking at the distribution of actual flight time, minus the scheduled flight time for Easy Jet and Ryan Air flights.

This suggests that with both airlines you are likely to spend less time in the air than scheduled - providing some very weak evidence they (as in Ryan Air and Easy Jet) pad the scheduled flight times.

Estimated minutes difference

The plots below show the difference between Ryan Air and Easy Jet. Both models had route as a random effect. Models were otherwise unadjusted. This model (slopes and intercepts vary across routes), rather than a standard fixed effects model (intercepts vary/robust errors/no adjustment for clustering being three other options within a fixed effects model), seemed to be in line with each route being a sample from all possible routes.

  1. Model one (on the left) is testing the estimated effect of a flight being Ryan Air, rather than Easy Jet, on how many minutes early/late it will be relative to scheduled flight time.
    • In this model it appears that Ryan Air is ahead of their schedule on three of the routes relative to Easy Jet, while Eazy Jet makes a lot better time relative to the scheduled flight time on their Copenhagen to London route.
  2. Model two (on the right) is testing whether Ryan Air flights are scheduled to take longer than Easy Jet flights.
    • Here it seems that the reason Easy Jet are usually more ahead of their schedule on the Copenhagen flight is because the flight is scheduled to take longer Easy Jet.
    • Across the 6 flights though, there seems to be a fairly even spread, suggesting that Ryan Air doesn’t pad their schedules relative to Easy Jet (although maybe that’s because they both do…).
Regression model looking at whether Ryan Air flights are late (according to their own schedules) and then if Ryan Air schedules a longer flight time or not.

So do all the budget airlines pad their numbers?

In addition to pulling data on Ryan Air, I had data on four other airlines. The following plot shows the distribution of when planes took off and landed (relative to when they were scheduled to). In this plot it appears that all the airlines I looked at would systematically take off late and arrive just a little bit before they were scheduled.

Distrubution of minutes late leaving vs minutes late arriving across 6 airlines, 22 routes, and 3,421 flights.

Caveats

I only pulled data on a small selection of Ryan Air flights, where they shared routes with other airlines. So I don’t think I can answer the question with any confidence. I also couldn’t find any documentation that clearly laid out what the scheduled and actual times represents - although matching up the more detailed website data to the API data it looks like my interpretation is hopefully accurate.

I recently set up my own server on an old laptop, so plan is to collect more data now that I can schedule CRON jobs more reliably.

James Black
James Black
PhD (Cantab)

James Black. Kiwi | Epidemiologist | Data Scientist | Engineering enthusiast.

comments powered by Disqus

Related