"2020 is the start of a new decade – or does it start in 2021?" That was the debate on social media as we crossed over into the new year. There was also a lot of talk about remembering Y2K twenty years later, which inevitably led to speculation about how computers will behave in Y2038.
However, what's more interesting to me is just how little we discuss software bugs related to leap years, which occur much more frequently than any of these grand events. In an attempt to remedy this, I'd like to tell you a little about leap year bugs, how you can spot them in your code, and what we have been doing here at Microsoft to prevent them from impacting our products and services.
A leap year is a year which contains an extra day which we observe on February 29th and call a leap day. Because of the extra day, there are 366 days in the year instead of the usual 365.
Many think leap years occur every four years, but the exact algorithm is slightly more complicated:
Leap years are an important part of our calendar system, as they keep seasons and astronomical events from drifting from one year to the next. This is because the mean time it takes for the Earth to actually go around the sun is slightly more than 365 days, but not quite 366 days. The leap year algorithm approximates this as 365.2425 days – though it's important to note than no given year has a fractional number of days in it. Rather, we decide when to add an extra whole day and when not to.
For more detail on the science and history behind leap years, see the leap year article on Wikipedia, or watch CGP Grey's excellent short video on YouTube.
Note that this information refers to the proleptic Gregorian calendar, which is the primary calendar system used in business and computing today. Other calendar systems (such as the Buddhist calendar, Hebrew calendar, Hijri calendar, and others) have different rules for observing leap days or months.
While all fascinating topics, none are related to leap years. Even leap seconds – despite having the word "leap" in them, are related to a completely different phenomenon, and thus I won't dive into them in this post.
A leap year bug is what happens in software when programs that work with dates do not correctly take leap years into account. They might simply misapply the leap year algorithm, or they might ignore the difference between leap years and common years when manipulating dates.
I can demonstrate this without any code at all. Let me ask you a simple question: What date will it be a year from today? Let's say that "today" is January 1st, 2020. Got the answer? Now think about how you figured that out. Likely you did something like this:
If you did that, congratulations! You have just created a leap year bug! Don't feel bad, even the very best software engineers sometimes do this.
Not getting it? Ok – let's try that exact same process again, but this time let's say that "today" is leap day – February 29th, 2020:
See the problem? The resulting date does not exist. 2021 is not a leap year, it is a common year and thus February only has 28 days in that year.
As a human being, if you were looking for February 29th on the calendar and it wasn't there, likely you would just pick February 28th and move on to better things. Computers, however, only do exactly what we tell them to do, and in many programing languages invalid input is expected to return an error.
Leap year bugs typically fall into two impact categories:
It's generally Category 1 bugs, such as the one described above, that are the most concerning. These are the type that are responsible for cloud service outages, bricked personal media players, airport baggage handling mishaps, and catastrophic industrial equipment failures.
Category 2 bugs do not typically lead to outages, but that doesn't mean they aren't important. They occur when "365 days" is substituted in place of a year – akin to calling a month "30 days". Most of us know that not every month has 30 days. It may suffice as an approximation, but is no good when you need a precise answer. Likewise, adding 365 days is fine if you need "about a year", perhaps for an expiration date. However, it won't lead to correct results if you're calculating financial reports or figuring out how many years old someone is.
Leap year bugs can be incredibly challenging to find, especially in large projects. It's even harder if you need to sift through billions of lines of source code across many different divisions of a company like Microsoft. However daunting the task, we've been taking leap year readiness quite seriously in order to reduce the risk of impact to our customers on Feb 29th. Over the past 9 months or so, I've been working with a small team of engineers in Azure to prepare the company for leap year. We've been searching through source code looking for potential leap year bugs, through both manual and automated approaches. We started within Azure, building upon some of the efforts from 2016's leap year, then expanded our scope to include a much larger portion of Microsoft. Tens of thousands of source code repositories have now been scanned, and are continually being scanned as new code is written by product teams.
A large part of the challenge is positive identification. Many of the items we find may match patterns associated with leap year bugs, but turn out to be benign. For example, we might detect that a year of a date is being incremented but miss that the day is always the 1st of a month – in which case there is no problem. We also tend to find more issues in test code than in code that makes its way into a product or service. Tests can still be important, as many teams depend on passing tests in order to build and release updates. In order to address these challenges, we've enlisted the aid of thousands of Microsoft engineers on each of the product or service teams. They examine our findings, triage them, and take action where necessary. Through this tedious process, we have identified and repaired many leap year bugs well before they can become an issue for customers.
Azure Engineering has fully audited the Azure services and supporting services, libraries, and operating systems for leap year issues. While the code has been reviewed extensively, we also recognize that this is not a foolproof process and one approach is not enough. As we want to protect our customers in every way possible, we have prepared the company through multiple avenues including:
Here is an example of .NET code containing a leap year bug, written in C#, that uses the DateTime structure. It is trying to add a year to today, but it is doing it in a way that doesn't account for February 29th.
DateTime dt = DateTime.Today;
DateTime result = new DateTime(dt.Year + 1, dt.Month, dt.Day);
There are variations of this of course. Perhaps the data type is a DateTimeOffset structure. Perhaps the date doesn't originate with Now or Today but comes from a stored value. Perhaps the number of years being added or subtracted is variable. In all variations, the problem is the same – when the result is a date that doesn't exist, an exception is thrown. (Specifically, an ArgumentOutOfRangeException.)
For .NET, the solution is simple. Just use the AddYears method:
DateTime dt = DateTime.Today;
DateTime result = dt.AddYears(1);
The AddYears method is built in, and works by doing what I said a human might do. It extends the logic to ensure the resulting value is valid. If not, it adjusts to the last day of the month (February 28th).
Now let's take a look at some C++ code that exhibits the same problem, using the Windows SYSTEMTIME structure.
SYSTEMTIME st;
GetSystemTime(&st);
st.wYear++;
At this point, the st variable could contain an invalid date, such as February 29, 2021.
Unlike .NET's DateTime structure, the Windows SYSTEMTIME structure doesn't care if it represents a valid date or not. It is just a plain data type containing separate integers for year, month, day, and so on. It is not until the structure is used by a function that these fields are recomposed into a logical date, and it's up to that function on how to deal with invalid dates.
So what is the solution for this C++ code? Well, unlike .NET, there is no built-in AddYears method to call. Instead, one has to ensure the structure has a valid date.
Consider the following:
bool isLeapYear = st.wYear % 4 == 0 && (st.wYear % 100 != 0 || st.wYear % 400 == 0);
st.wDay = st.wMonth == 2 && st.wDay == 29 && !isLeapYear ? 28 : st.wDay;
That may look like a lot, but what it's doing makes sense: First it tests if the year is a leap year. If it's not, and it's February 29th, it moves it back to February 28th.
Alternatively, one might consider converting to a FILETIME, adding 365 days, then converting back to a SYSTEMTIME. If doing so, ask yourself if it's ok for the result to sometimes be off by a day.
One area you might find a Category 1 leap year bug is when preparing valid from/to dates for certificates. Certificates must have valid dates on them, so passing invalid ones will likely fail certificate generation processes. One such API in Windows is CertCreateSelfSignCertificate, which creates a "self-signed" certificate, and is a very common case for a leap year bug. If you have code that generates certificates (via any mechanism), you should examine the logic used for determining their validity dates carefully.
Another area where leap year bugs are common is when dealing with anniversary dates, such as birth dates. Did you know that a person born on a leap day is called a "leapling"? I've certainly heard stories from leaplings describing how bothersome it is when some computer system won't accept their birth date, but the risk goes beyond that.
Imagine you have a user's date of birth, or date of employment, or date of first login to your website stored in a database. Maybe every year you want to send them a "happy birthday" or "happy anniversary" email, or perhaps an invoice. A common way to figure out when to do that might be to take the month and day from the user and apply it to the current year. If you do just that, then you have a leap year bug that will occur when its not a leap year!
Let's see that in C# code:
DateTime birthdayThisYear = new DateTime(DateTime.Now.Year, birthDate.Month, birthDate.Day);
One approach to remedy the problem would be to figure out the number of days in the month and adjust, like this:
int year = DateTime.Now.Year;
int month = birthDate.Month;
int day = birthDate.Day;
int daysInMonth = DateTime.DaysInMonth(year, month);
if (day > daysInMonth)
{
day = daysInMonth;
}
DateTime birthdayThisYear = new DateTime(year, month, day);
Another approach, which is arguably simpler, is to just add the difference in years like this:
DateTime birthdayThisYear = birthDate.AddYears(DateTime.Now.Year - birthDate.Year);
Really, a leap year bug might be anywhere that dates are being manipulated by some logic, not just these.
Indeed there are. I described some in my blog post from last leap year. I've also started documenting some of them in this Stack Overflow question (which you can contribute to!)
Here are a few to watch out for:
Leap year bugs can be introduced in any programming language. However some languages such are not as prone to Category 1 impact bugs as others.
For example, in JavaScript,
var d = new Date();
d.setFullYear(d.getFullYear() + 1);
This code won't fail on leap day, but the Date object will advance to March 1st.
Other languages like Python are prone to Category 1 impact bugs. For example:
from datetime import date
today = date.today()
later = date(today.year + 1, today.month, today.day)
That will raise a ValueError when run on a leap day.
With either of these two languages, consider the best option may be to use an open source library for date manipulation.
No. Another date that can cause problems is December 31st, as it is the 366th day of a leap year. Some applications may be hard-wired for 365 days.
Additionally, not all date manipulate happens around the current date. It's reasonable that a leap day bug could be encountered any day of the year. It's just that a lot of code works with "today" as a basis, so leap year bugs are more likely to become visible on the leap day.
One excellent way to ensure your code is free of leap year bugs is unit testing, using the "Virtual Clock" pattern (also known as, "Mock the Clock"). The general idea is to treat the system clock as a service, rather than as a simple property or method call. You can then test this service in the same way that you might test any other service, such as a service that makes a network call. The advantage being that one can prove deterministically that their code is resilient to a variety of dates, before or after those dates actually come about.
Note that this pattern takes several different forms and can vary slightly per language. Also, it already exists in several open source libraries, such as Noda Time for .NET. It can also be implemented manually.
Here is an example implementation in C#
// Here is the clock interface
public interface IClock
{
DateTimeOffset GetCurrentTime();
}
// This implementation will be used at runtime
public class SystemClock : IClock
{
public DateTimeOffset GetCurrentTime()
{
return DateTimeOffset.UtcNow;
}
}
// This implementation will be used during testing
public class FakeClock : IClock
{
private readonly DateTimeOffset _value;
public FakeClock(DateTimeOffset value)
{
_value = value;
}
public FakeClock(int year, int month, int day)
{
_value = new DateTimeOffset(year, month, day, 0, 0, 0, TimeSpan.Zero);
}
public DateTimeOffset GetCurrentTime()
{
return _value;
}
}
With these defined, you can now depend on the IClock interface in your application components:
// Example usage, via constructor injection
public class SomeComponentOfMyProgram
{
private readonly IClock _clock;
public SomeComponentOfMyProgram(IClock clock)
{
_clock = clock;
}
// a simple example
public string GetOneYearFromToday()
{
var now = _clock.GetCurrentTime();
return now.AddYears(1).ToString("yyyy-MM-dd", CultureInfo.InvariantCulture);
}
}
At runtime, one might wire up through a Dependency Injection container, like so:
services.AddSingleton<IClock, SystemClock>();
But in unit tests, we can use the FakeClock:
public void CanGetOneYearFromNormalDate()
{
var clock = new FakeClock(2020, 1, 1);
var component = new SomeComponentOfMyProgram(clock);
var result = component.GetOneYearFromToday();
Assert.Equal("2021-01-01", result);
}
public void CanGetOneYearFromLeapDay()
{
var clock = new FakeClock(2020, 2, 29);
var component = new SomeComponentOfMyProgram(clock);
var result = component.GetOneYearFromToday();
Assert.Equal("2021-02-28", result);
}
One might think that the easiest thing to do would be to create an environment where we turn the clock forward to February 29th and see what fails. While this might work for small individual programs, it's usually not a viable option for distributed systems because time is so interwoven to every dependency. For example, does your application send telemetry to a logging service? If so, how will that service handle events with timestamps from the future? Perhaps it will discard them, which might make your system appear as if it is offline. As a more concrete example, consider that most web applications require SSL certificates that are signed by a trusted certificate authority. The validity start and end dates are timestamped. What will happen if your certificate is expired when your tests run? Lastly, consider that many authentication protocols, such as Kerberos and OpenID Connect, include timestamping and validation of timestamps as a security mechanism. Will your system behave correctly if authentication fails? Even if none of these things matter to you, are you sure that you have the ability to exercise all of your code in a way that would highlight any problems? What if the result is simply a category 2 bug like described earlier – how would you catch that?
We actually have spent considerable research into this capability, and what we have learned is that time-forward testing is only a viable approach for testing a single machine running in isolation. We don't recommend it for modern cloud infrastructure and applications.
If you do decide to try it anyway, be sure to first disable all time synchronization services (NTP, w32tm, Hyper-V's VMIC Time Provider, etc.) – otherwise your clock might simply correct itself during your tests.
Leap year bugs aren't always disastrous, but they certainly can be, and leap day is just around the corner. Please take a few minutes now to go look at any code you might have in your application that manipulates dates. Think about what will happen when one of those dates is February 29th. If possible, go implement Virtual Clock in your tests.
Also, talk to your colleagues about leap day bugs. This isn't specialized or difficult knowledge, and leap year bugs certainly have happened many times before, but somehow we keep forgetting. Hopefully this time we can all work together to make leap day a fantastic non-event.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.