the general algo goes like so:
for chunk in corpus:
e-step
m-step
gensim hacks in multiple passes:
for pass_ in passes:
for chunk in corpus:
e-step
m-step
what we've been doing (only works for batch):
for pass_ in passes:
for bound_iter in iters:
for chunk in corpus:
e-step
m-step
break if done
for online updates, would it make more sense to:
for chunk in corpus:
for bound_iter in iters:
e-step
m-step
break if done
this would give us something that works the same for batch (via chunksize=len(corpus)
and bound_iters > 1
)
but also something that works for online mode (via chunksize<len(corpus)
and bound_iters > 1
).
Well the main difference between bound_iter and passes is that it has a convergence criterion, it's not just "do this 10 times," and secondly it doesn't update the decay