Skip to content

Instantly share code, notes, and snippets.

@jrhemstad
Last active August 27, 2019 16:17
Show Gist options
  • Save jrhemstad/18f6ae34879ba36536fe1831b68722d0 to your computer and use it in GitHub Desktop.
Save jrhemstad/18f6ae34879ba36536fe1831b68722d0 to your computer and use it in GitHub Desktop.
Musings on STRING cudf::columns
  • String Column Factory

    • make_string_column(...)
    • What are the necessary inputs?
    • Do we need more than one factory for String columns for different inputs?
  • String column wrapper type

    • There should be a type, cudf::string_column that is a thin wrapper around cudf::column that encodes behavior unique to string columns, e.g., it abstracts which children are offsets vs characters, etc.
    • Example usage would be something like:
      unique_ptr<cudf::column> col = make_string_column(...);
      
      // String specific view constructable from a `cudf::column`of type STRING
      cudf::string_column_view strings{col}; 
      
      // The wrapper type abstracts the roles of children
      column_view offsets = strings.offsets(); // returns child that holds offsets
      column_view chars = strings.characters(); // returns child that holds characters
      
      // Top-level string specific APIs can accept string_column_views directly
      void string_specific_function(string_column_view strings); 
      
      // wrapper type can provide string specific device functions
      __global__ kernel( string_column_view strings1, string_column_view strings2){
         ...
         // compares string `i` in `strings1` to string `j` in `strings2`
         compare_strings(strings1, i, strings2, j);  
      }
      
      
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment