Skip to content

Instantly share code, notes, and snippets.

@fuyi
Created March 1, 2022 15:06
Show Gist options
  • Save fuyi/a382278a90e25e220084ea43561bc30c to your computer and use it in GitHub Desktop.
Save fuyi/a382278a90e25e220084ea43561bc30c to your computer and use it in GitHub Desktop.
Create custom window strategy to fulfil sequencer model feature requirement

Create custom window strategy to fulfil sequencer model feature requirement

Background

To train our neural sequencer model properly, we need to get the ordered sequence of user interaction with articles in a certain period of time. To formulate the requirements, the input dataset for the model training can be described as:

The last X articles the user has interacted with in the last Y days in chronological order.

X: variable with integer value between [0, 20] inclusive

Y: variable with integer value between [20, 50] inclusive

The problem

To generate the above mentioned feature, we use Apache Beam pipeline with session window strategy. According to the investigation done by Axel, BTCACE-638: PDP Usecase - identify optimal session window durationCLOSED , we can’t identify an optimal session window gap to fulfil our requirement. Therefore we need to find an alternative to address this issue.

One alternative is to use sliding window strategy. This strategy can guarantee that we get correct features in the Y days duration, but since each event is duplicated for each window, we will have Y times of data in memory, which is a working but not optimal solution.

Another alternative, also the optimal solution is: to create a beam custom window strategy so we get

only 1 window for each user for the last Y days.

However, we are lacking of experience in this area. In this ticket, we want to identify the feasibility of the custom window strategy.

Definition of Done

  • The feasibility of the custom window strategy is determined.
  • If it is feasible to create custom window strategy,
    • the solution is proposed/discussed/agreed with the team
    • custom window strategy is documented
    • custom window strategy is implemented and verified in production environment (can be splitted into a separate ticket)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment