Rotary encoding DATETIME for machine learning in BigQuery

Paul Leroy
4 min readOct 8, 2023

--

One of the most impactful algorithms of the year is the transformer model. This is what has seen tools like the five year old BERT, GPT and most of the current large language models (LLMs) come to light. If you know the model then you’ll have seen the rotary encoder after the tokenizer on the encoder side and the reverse, I suppose rotary decoder, on the decoder side. The function of the rotary encoder is to give import to the position of the word in a sequence. It also has the weird feature that preserves relation to neighboring words. It’s not the only time I’ve seen these used, I encountered one in a BigQuery taxi fare prediction algorithm when processed weekdays.

Here is the key feature when doing any machine learning with time series. Dates and datetime are useless in machine learning as is, if I want to know what happens on a date in the training data then I can just read the newspaper. Temporal information has to be engineered to be useful. The day of the week is the easiest to understand, every 7 days certain things that are going to happen, every Monday for instance most people go to work. The common way of dealing with the day of the week is usually to either index the day (1 for Monday, 2 for Tuesday etc) or one-hot-coding ([1,0,0,0,0,0,0] for Monday. Indexing is a bad way of doing this as What makes Friday more important than Monday (5 is bigger than one). Keep in mind machine learning only works on numbers … EVER, we have to find creative ways of making categorical information into numbers.

Now, I am going to caveat my solution as it is designed for the taxi data toy problem, it has two cycles in the data that are useful, the hour of the day and the day of the week. One-hot-encoding would give (7+24) 31 parameters. A better way would be feature crossing, hour crossed with weekday but this results in (7 x 24) 168 parameters. The more parameters I have the more processing power is required to build and infer on a model. Also, both mechanisms lose the important information that Monday is next to Sunday and Tuesday, and 23:00 hours is next to 20:00 and 22:00. It’s obvious looking at it now right?

Enter the rotary encoder.There is usually a cyclic nature in a lot of the data, Sunday (if you consider it the last day of the week, I don’t judge) has two neighbors, Monday and Saturday. Sunday is also the furthest from midnight between Wednesday and Thursday. Wednesday is also opposite to the middle of the weekend, there is now a data science reason for it! I was wondering if this could firstly be written efficiently in BigQuery and secondly could be generalized enough to be useful.

If you use this mechanism then you will need to be aware of any cycles in your data. Industrial internet of things may have tighter cycles as well as the usual daily cycles. It also leads to an idea for an interesting follow on article to create a rotary autoencoder using Fourier transforms.

First we need a general rotary encoder. This is simply taking the value (the hour of the day) and the range of the hours and plotting this on a circle. Two ways of thinking about this are: taking a 24 hour line and wrapping it around a circle or using a 24 hour analog clock. The output of this function is an x and y value. The input would be the current hour and the range would be 24.

(x,y) = ( cos(2π * input / range) ,sin(2π * input/range) )

I don’t really think the x and y order make much of a difference to the model as long as your decoder works the same.

Now you can look at your data, deciding on which elements have cycles and break them into those vectors. So when I looked at the taxi data there is an annual cycle (Holiday periods), monthly cycles (pay day and going out), weekly cycles (work versus weekend), and daily cycles (peak versus off peak). I decided to encode the daily cycles as a more continuous vector as the number of seconds elapsed over 86400 seconds (total for an average day). I also adjusted the monthly (February versus December) and annual cycles (leap years) to the month length and year length.

I haven’t written a decoder because the taxi problem doesn’t require one, but it does require the datetime to be encoded at prediction time.

If you’re interested in the algorithm itself then here it is:

-- General rotary encoder
CREATE TEMP FUNCTION RotaryEncode(x float64,limits float64)
RETURNS struct<y0 FLOAT64,y1 FLOAT64>
AS (
struct( sin(x/limits*2*ACOS(-1)),
cos(x/limits*2*ACOS(-1)))
);

-- DateTime to Vector
CREATE TEMP FUNCTION date_to_vect(x datetime)
returns struct<day_of_year struct<y0 float64,y1 float64>,month struct<y0 float64,y1 float64>,time_of_day struct<y0 float64,y1 float64>,day_of_week struct<y0 float64,y1 float64>,day_of_month struct<y0 float64,y1 float64>>
AS (
struct(
RotaryEncode(EXTRACT (dayOFYEAR FROM x),extract(dayofyear from last_day(x,YEAR))),
RotaryEncode(EXTRACT (month FROM x),12),
RotaryEncode(TIME_DIFF(time(CURRENT_DATETIME()), "00:00:00.000", SECOND),86400) ,
RotaryEncode(EXTRACT(DAYOFWEEK FROM x),7) ,
RotaryEncode(extract(day from x),extract(day from last_day(x,MONTH))))
);

-- Test
select date_to_vect(current_datetime()) as date_vector

The cool thing is the datetime cycles are encoded with most of the information you need (see comment on cycles in your data) and it is only 10 values wide which is even smaller than one-hot-encoding and also allows use of deep neural networks on this to pick up complex interplays in the temporal data. I’m not sure if the struct format or just an array output is more useful, but feel free to adjust it to your use case.

--

--

Paul Leroy
Paul Leroy

No responses yet