Evidently the RAM required to prove the root rollup is a pain point. One relatively simple way to alleviate that would be to split the work across multiple circuits. Need to investigate (a) the extent to which this is really an issue and (b) how much splitting would be needed to alleviate.