Handling DST switch
Timezones in Java application and Postgres Database
Background
I have a Spring boot application which uses Spring Batch to pull data from bunch of APIs. Say for a lot of reasons, this app should be running with default timezone set to EST5EDT.
What is EST5EDT?
It switches between EST to EDT whenever DST (Daylight Saving Time) is on.
UTC is 5 hours ahead of EST and 4 hours ahead of EDT
For Example:
Newyork DST for 2020 :
March 8 2020 02:00 is when we set the clock one hour forward
November 1 2020 02:00 is when we set the clock one hour backward
i.e. 1:00 to 2:00 will appear in both EDT and EST.
As mentioned my application runs with default timezone EST5EDT and it stores the Date value into Postgres’s TIMESTAMP datatype. There are two Timestamp types in Postgres — timestamp with timezone & timestamp without timezone. We will get to it in a minute.
I have written tests that persist the above Dates where timezone shifts, from this app to Postgres table with TIMESTAMP column, read it back and assert if they are same.
Timestamp datatype in postgres does not care which timezone it is. It just stores the value.
So when I send “20201101 01:05:00 EDT”, it only stores “20201101 01:05:00” in postgres. When application, which is running with EST5EDT reads it back, it reads it as “20201101 01:05:00 EST” and this is where assertion fails.
How to solve this?
The first solution you will see online is to run your application with UTC timezone. That way only UTC is getting stored in the database in the first place and you do not have to worry about timezone issues. Let’s not go into the debate of whether all applications should be running as UTC. But say for the sake of this discussion, that is not an option.
How about storing the fact that the date you inserted is in EDT?
This is where postgres’s TIMESTAMPTZ datatype comes into picture. It means Timestamp with timezone.
But postgres allocates both TIMESTAMP and TIMESTAMPTZ the same 8 bytes. So how does TIMESTAMPTZ store that “extra timezone” information.
It does not. It simply converts whatever you pass into UTC internally.
If you are writing “20201101 01:05:00 EDT” to Postgres, both TIMESTAMPTZ and TIMESTAMP sends it as “20201101 01:05:00 +4:00” —i.e. with offset attached.
TIMESTAMP type discards the offset “+4” and TIMESTAMPTZ converts this to UTC — “20201101 05:05:00 UTC” and stores it.
So when you are doing a select statement on psql or any client like dbeaver you should see the UTC time right? Well, no.
This is where the timestamp of the postgres server comes into picture.
If the timezone of the postgres server is set to UTC, yes you will see the TIMESTAMPTZ value as“20201101 05:05:00”
For example, mine is set to IST so I see “20201101 10:35:00” (IST is 5:30 hrs ahead of UTC).
show timezone;
Note that postgres server timezone is irrelevant to TIMESTAMP though. Explained clearly here (excerpt below). So if I used TIMESTAMP column and inserted “20201101 01:05:00 +4:00”, it would have been stored as “20201101 01:05:00” (offset ignored) and even if your postgres timezone is in “IST”, you would still see “20201101 01:05:00”
TIMESTAMP WITHOUT TIME ZONE
stores local date-time (aka. wall calendar date and wall clock time). Its time zone is unspecified as far as PostgreSQL can tell (though your application may knows what it is). Hence, PostgreSQL does no time zone related conversion on input or output. If the value was entered into the database as'2011-07-01 06:30:30'
, then no mater in what time zone you display it later, it will still say year 2011, month 07, day 01, 06 hours, 30 minutes, and 30 seconds (in some format). Also, any offset or time zone you specify in the input is ignored by PostgreSQL, so'2011-07-01 06:30:30+00'
and'2011-07-01 06:30:30+05'
are the same as just'2011-07-01 06:30:30'
. For Java developers: it's analogous tojava.time.LocalDateTime
.
Conclusion
Final fix I put was to change all TIMESTAMP datatype to TIMESTAMPTZ. I could have added a type handler that converts all of the Date to UTC before persisting to postgres but since I do not have much control over the spring batch DAO, I could not do that. The reason I needed to alter the spring batch table schemas was because one of the table stores the job parameters of type date which is of interest to the application itself. Also I didn’t like to have the application data in UTC and batch job metadata (like start time, end time, etc) in EST5EDT.
CREATE TABLE BATCH_JOB_EXECUTION_PARAMS (
JOB_EXECUTION_ID BIGINT NOT NULL ,
TYPE_CD VARCHAR(6) NOT NULL ,
KEY_NAME VARCHAR(100) NOT NULL ,
STRING_VAL VARCHAR(250) ,
DATE_VAL TIMESTAMPTZ DEFAULT NULL ,
LONG_VAL BIGINT ,
DOUBLE_VAL DOUBLE PRECISION ,
IDENTIFYING CHAR(1) NOT NULL ,
constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;
Since I have control over the Spring batch table schema, I just directly changed the datatype to TIMESTAMPTZ.
I have committed my test setup here. Readme has enough instructions to play around with Postgres timezone and the Timestamp types.