Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search Sign up Reseting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

pytorch / xla Public

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'mappingproxy' object does not support item assignment #2773

tchaton opened this issue Feb 10, 2021 · 15 comments

TypeError: 'mappingproxy' object does not support item assignment #2773

tchaton opened this issue Feb 10, 2021 · 15 comments stale Has not had recent activity

Comments

tchaton commented Feb 10, 2021 •

🐛 Bug

To Reproduce

The callbacks type in the checkpoint created by this function raised TypeError: 'mappingproxy' object does not support item assignment when running xm.save(checkpoint, . )

 def on_save_checkpoint(self): """Called when saving a model checkpoint.""" callback_states = <> for callback in self.callbacks: callback_class = type(callback) state = callback.on_save_checkpoint(self, self.get_model()) if state: callback_states[callback_class] = state return callback_states

Would it be possible for you to support resolve this bug, as it is currently breaking PyTorch Lightning TPU support for callbacks states.

Steps to reproduce the behavior:

Expected behavior

Environment

Reproducible on XLA backend [CPU/TPU]:
torch_xla version:

Additional context

The text was updated successfully, but these errors were encountered:

ananthsub mentioned this issue Mar 11, 2021 tchaton commented Mar 30, 2021 •

yes, state_dict works fine, but in PyTorch Lightning, we will save the trainer states too (callbacks, etc . )

Modify this line to trigger the error:

self.save(, filepath)

self.save(checkpoint, filepath)

Collaborator zcain117 commented Apr 2, 2021

create VM: gcloud compute instances create zcain-vm --zone=us-central1-a --machine-type=e2-highmem-16 --image-family=torch-xla --image-project=ml-images --boot-disk-size=300GB --scopes=https://www.googleapis.com/auth/cloud-platform create TPU: gcloud compute tpus create zcain-tpu --zone=us-central1-a --network=default --version=pytorch-nightly --accelerator-type=v3-8 SSH into VM: (VM) git clone https://github.com/PyTorchLightning/pytorch-lightning.git (VM) conda activate torch-xla-nightly (VM) export TPU_IP_ADDRESS= (VM) export XRT_TPU_CONFIG="tpu_worker;0;$TPU_IP_ADDRESS:8470" (VM) cd pytorch-lightning (VM) pip install . (VM) pip install -r requirements/test.txt (VM) cd pl_examples/domain_templates/ (VM) vim computer_vision_fine_tuning.py (VM) (VM) python computer_vision_fine_tuning.py

If I use torch-xla-nightly I get an unrelated crash that seems to be because of Tensorboard

If I use torch-xla-1.8 I get a crash like this:

 File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/tpu_spawn.py", line 116, in new_process results = trainer.run_stage() File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 646, in _save_top_k_checkpoint self._update_best_and_save(current, epoch, step, trainer, monitor_candidates) File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 535, in run_stage self.run_train() File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 535, in run_stage self.run_train() File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 708, in _update_best_and_save self._save_model(trainer, filepath) File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 410, in _save_model self._do_save(trainer, filepath) File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 422, in _do_save self.save_function(filepath, self.save_weights_only) File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 599, in run_train self.train_loop.run_training_epoch() File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 599, in run_train self.train_loop.run_training_epoch() File "/anaconda3/envs/torch-xla-1.8/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch self.trainer.run_evaluation(on_epoch=True) TypeError: 'mappingproxy' object does not support item assignment

If I use torch-xla-1.7 , then I get the same crash.

@tchaton it seems like your CI is using torch-xla-1.7 , since I see a line like this: Downloaded newer image for pytorchlightning/pytorch_lightning:base-xla-py3.6-torch1.7 . It seems like there is a test to run with disabled checkpoints here: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/tests/models/test_tpu.py#L391

Some of my questions:

Is there a CI test that runs with checkpointing enabled, and if so, what version of torch-xla is it using and is it passing?
How new is the behavior to save checkpoints plus these callback states?
Was there ever a version of torch-xla-1.8 that worked, maybe before the changed saving behavior?