In the summer of 2020 I worked with Paul Oullette in a Research Experience for Undergraduates (REU) lead by Professor Fatemeh Nargesian. Our goal was to build a dataset search engine that would show off some of Professor Nargesian's research.
- Build a publicly availible dataset search engine.
- Allow users to use keyword search to find datasets.
- When viewing a table in a dataset, show columns in other datasets that can be
join
ed with columns in said table. This would show off Professor Nargesian's LSH Ensemble paper.
I focused on the keyword search engine while Paul focused on the joinability engine.
At the end of the summer I left the project to spend more time learning about programming langauges and compilers.
Paul stuck with it. He removed the keyword search engine because it didn't perform well (as described in the Weaknesses section), and replaced it with the faiss search engine. He added the ability to generate a directory structure over the datasets based on this paper from Professor Nargesian. His work eventually culminated in a demo at VLDB '21.
The purpose of the keyword search engine was to help users find datasets. We wanted the user experience to be similar to that of Google's dataset search engine.