Page MenuHomePhabricator

PytorchExperiment resuming doesn't work as expected
Open, HighPublic

Description

It's impossible to resume a PytorchExperiment with just

Exp(resume="some_experiment")

because there is no base_dir key in self._config_raw. By default the latter is constructed with

self._config_raw = self._config_raw_from_input(config, name, n_epochs, seed, append_rnd_to_name)

where config is None by default. As a consequence there is no base_dir to read. The important part here is that the PytorchExperiment SHOULD NOT REQUIRE a base_dir. Seeing that the initialization code is very long and complex, I suggest refactoring that will make it easier to understand and avoid such mistakes.

Event Timeline

petersej triaged this task as High priority.Thu, Nov 14, 3:51 PM
petersej created this task.
petersej created this object with edit policy "Custom Policy".