- Xavier Dupré (Microsoft): keep .fit in the API but X can be a stream from Spark for example. Transparent for the user. Gaël: indexable and len is n_samples, is that good enough? Answer: X accessible through sequential iterator.
- Jean-François Puget (IBM): IBM is betting on Spark at the scale of the company. Most machine learning applications have small data but some don't. How can the bridge b/w scikit-learn and Spark get better? How to get scikit-learn used in a distributed environment? Not all algorithms can work out-of-core, need distributed algorithm.
- Jean-Paul Smet (Nexedi): Nexedi is an example company. Wendelin.core helps us removing the overhead, and enabling out of core computing. Next step of the story in a year.
- Fabian Mangeant (Airbus): Big industrial companies like Airbus need a few years of visibility. They can help funding, but in exchange for stability and warranties on long term support of some releases.
- Jean Noel Puget (IBM): Some companies (maybe IBM) are interested in having scikit-learn scale better on clouds and distributed systems. They might contribute expertise and platforms for experimentation.
- Jean-Paul Smet (Nexedi): It seems that you have a hiring problem. Do you need money, or people?
Notes taken by Loic Esteve and Gaël Varoquaux. We hope that we haven't distorted too much what was said, or forgotten anything important.
Have you seen this?
Adapts spark to ooc and distributed.
https://github.com/jcrist/dask-learn