Python Data Validation: Date & Time

Regardless of language, handling dates and times is trickier than simple numbers and strings. This is because, even within the Gregorian system, there are a wide range of different formats in addition to multiple time zones and daylight savings / summer time corrections. Just to complicate things, the corrections vary according to date, and these dates are different for each year.

The transfer of date/time information is greatly simplified with the use of ISO 8601 standard formatting. Often referred to as simply an “ISO date”, this uses a most-to-least-significant order and limits the characters to the decimal digits, ‘-‘, ‘:’, and the letters (“T”, “W”, “Z”).

A typical ISO date and time is “2019-8-29T11:12:34Z” which specifies 11:12:34 am on the 29th of August 2019, Universal Time. The T acts as a separator between the date and time. The Z signifies Universal Time. A “W” can be used to specify a week – for example, “2019-W31-1” is the first day (Monday) of the 31st week of 2019. The seconds parameter can be a decimal.

A different time zone can be specified by replacing the “Z” with a ‘+’ or ‘-‘ and then the time zone correction as hours and optionally minutes, formatted as “hh”, “hhmm”, or “hh:mm”.  Although many people and software assume the local timezone, this is a dangerous thing to do. The sender and recipient may not be in the same time zone or daylight/summer correction regime.

Parsing ISO Dates

In theory Python’s datetime.strptime() function can be used to parse dates. This is often a good choice if you know explicitly what format and time zone the date/time string is in. For example:

from datetime import datetime

strval = '2019-08-01'
y = datetime.strptime(strval, '%Y-%m-%d')
print(y)

The full list of date/time format strings can be found in the official Python documentation. Notice that this assumes a specific month/day order and lacks any time zone information.

Unfortunately strptime() quickly gets much more complicated if all you know is that the string meets ISO 8601 and has any mixture of valid date and/or time specifiers and timezone settings. The solution is to use the python-dateutil extension which can be installed with ‘pip install python-dateutil’.

>>> import dateutil.parser

# Basic UTZ example
>>> dateutil.parser.parse('2019-08-01T06:30:00Z')
datetime.datetime(2019, 8, 1, 6, 30, tzinfo=tzutc())

# With a different timezone
>>> dateutil.parser.parse('2019-08-01T06:30:00+06')
datetime.datetime(2019, 8, 1, 6, 30, tzinfo=tzoffset(None, 21600))

# ISO 8601 'basic' format (ie. minimal formatting)
>>> dateutil.parser.parse('20190801T063000+06')
datetime.datetime(2019, 8, 1, 6, 30, tzinfo=tzoffset(None, 21600))

Time Zone Conversion

Time zone conversions can also be complicated due to the various definitions and different dates for daylight savings/summertime adjustments. They also change year-to-year. The solution is to use the pytz library:

>>> from pytz import timezone
>>> from datetime import datetime

# Create a sample datetime
>>> d = datetime(2019,8,1,9,30,0)

# Specify the timezone as US-Central
>>> loc_d = timezone('US/Central').localize(d)
>>> print(loc_d)
2019-08-01 09:30:00-05:00

# Convert to Bangalore timezone
>>> bang_d = loc_d.astimezone(timezone('Asia/Kolkata'))

>>> print (bang_d)
2019-08-01 20:00:00+05:30

Note that US Central is usually defined as -6hrs. The difference is because Daylight Savings Time applies for August dates.

Performing comparisons and maths across a daylight savings boundary can also pose out-by-one problems. See timezone.normalize() for further information and a solution.

Validating Dates

To validate a date/time string, simply use one of the above conversion methods and trap the ValueError exception:

import dateutil.parser 

# Is sval a valid ISO date/time?
# Returns true if it is, false if it isn't

def validate_iso( sval ):
    try:
        valid_datetime = dateutil.parser.parse(sval)
        return True
    except ValueError:
        return False

 

Next

So far we have ignored Unicode issues. Unicode is discussed in the next article in this series.