Skip to content

Instantly share code, notes, and snippets.

@tonyta
Last active August 29, 2015 14:11
Show Gist options
  • Save tonyta/81dac0f508b587aa7d03 to your computer and use it in GitHub Desktop.
Save tonyta/81dac0f508b587aa7d03 to your computer and use it in GitHub Desktop.
Cursor Pagination

Cursor Pagination

It's my understanding that with cursor pagination, there are two types of cursor pagination that differ depending on what the cursor represents:

  1. cursor represents a single point in the dataset
  2. cursor represents a contiguous batch of the dataset

Cursor as Single Point

In this type, the cursor points to a specific point (maybe a record) in a dataset. This is the type used by Facebook and the response looks like this:

{
  "data": [ ... ],
  "paging": {
    "cursors": {
      "after": "xyz",
      "before": "abc"
    },
    "next_page": "https://www.example.com/api/cats?after=xyz",
    "prev_page": "https://www.example.com/api/cats?before=abc"
  }
}

In the above, the before cursor "abc" represents the beginning of the page, or the point before the first record of the response-set. And after cursor "xyz" represents that after the last. This means if we wanted to retrieve the next page of results, we would want to get results after the last record of the current page: "xyz", or https://www.example.com/api/cats?after=xyz.

Asking for https://www.example.com/api/cats?before=xyz or https://www.example.com/api/cats?after=abc will retrieve the current page. The concept of before and after is needed to differentiate which side of the point you are requesting records from.

Cursor Represents a Batch

In this type, the cursor points to a contiguous batch (or page) of records in a specific dataset. This is the type used by Twitter and the response looks like this:

{
  "data": [ ... ],
  "paging": {
    "cursors": {
      "current": "123",
      "next": "xyz",
      "previous": "abc"
    },
    "next_page": "https://www.example.com/api/cats?cursor=xyz",
    "prev_page": "https://www.example.com/api/cats?cursor=abc"
  }
}

Unlike when the cursor represented a single point, these cursors represent whole batches. The concept of before and after don't apply here and we retrieve the correct batch directly using cursor as in: https://www.example.com/api/cats?cursor=xyz

One caveat to losing before and after is that we no longer have a way to refer to the current page. A solution might be to also return the cursor representing the current batch.

@bds
Copy link

bds commented Dec 19, 2014

The Twitter API link you supplied says this:

The next_cursor is the cursor that you should send to the endpoint to receive the next batch of responses, and the previous_cursor is the cursor that you should send to receive the previous batch.

# response
{
    "ids": [
        333156387,
        333155835,
        ...
        101141469,
        92896225
    ],
    "next_cursor": 1323935095007282836,
    "next_cursor_str": "1323935095007282836",
    "previous_cursor": -1374003371900410561,
    "previous_cursor_str": "-1374003371900410561"
}
# request
https://api.twitter.com/foo?cursor=1323935095007282836

throws a link into the ring

https://cloud.google.com/appengine/docs/python/datastore/queries#Python_Query_cursors

The quotes from the above URL best captured my sentiment at the time.

My intent was to support forward, and then backward, navigation of result sets, excluding newly added records. Parameter name were chosen without respect to any particular "Datastore" or implementation.

After performing a retrieval operation, the application can obtain a cursor,
which is an opaque base64-encoded string marking the index position of the
last result retrieved. The application can save this string
(for instance in the Datastore, in Memcache, in a Task Queue task payload,
or embedded in a web page as an HTTP GET or POST parameter), and can then use
the cursor as the starting point for a subsequent retrieval operation to
obtain the next batch of results from the point where the previous retrieval
ended.

The API accepts a GET parameter next_cursor, which encodes the information to determine the last record sent to the client and that the client would like the next set of results.

A not yet implement previous_cursor would encodes the information to determine the last record sent to the client and that the client would like the previous set of results.

We could assume that we will only support forward-pagination, or as the Twitter API specifies, the previous and next cursors could be contained in the response but only accept a cursor param in the request

A retrieval can also specify an end cursor,to limit the extent of the result set returned.

Not Implemented by the Scripted API

Limitations of cursors

Cursors are subject to the following limitations:

A cursor can be used only by the same application that performed the original
query, and only to continue the same query. To use the cursor in a subsequent
retrieval operation, you must reconstitute the original query exactly,
including the same entity kind, ancestor filter, property filters, and sort orders.
It is not possible to retrieve results using a cursor without setting up the
same query from which it was originally generated.

Cursors and data updates

The cursor's position is defined as the location in the result list after the
last result returned. A cursor is not a relative position in the list(it's not an offset);
it's a marker to which the Datastore can jump when starting an index scan for results.
If the results for a query change between uses of a cursor, the query notices
only changes that occur in results after the cursor. If a new result appears
before the cursor's position for the query, it will not be returned when the
results after the cursor are fetched. Similarly, if an entity is no longer a
result for a query but had appeared before the cursor, the results that appear
after the cursor do not change. If the last result returned is removed from the
result set, the cursor still knows how to locate the next result.

An interesting application of cursors is to monitor entities for unseen changes.
If the app sets a timestamp property with the current date and time every time
an entity changes, the app can use a query sorted by the timestamp property,
ascending, with a Datastore cursor to check when entities are moved to the
end of the result list. If an entity's timestamp is updated, the query with
the cursor returns the updated entity. If no entities were updated since the
last time the query was performed, no results are returned, and the cursor does not move.

When retrieving query results, you can use both a start cursor and an end
cursor to return a continuous group of results from the Datastore. When using
a start and end cursor to retrieve the results, you are not guaranteed that the
size of the results will be the same as when you generated the cursors.
Entities may be added or deleted from the Datastore between the time the
cursors are generated and when they are used in a query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment