Inference/Test Time Scaling
How does performance improve with more compute during test-time?
In what ways can be use more compute during test-time?
Parallel output - running the models multiple times and then do majority voting of the solutions
Longer output - Longer chain of thoughts.
Categorised as Sequential vs Parallel
Last updated