-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mounting a writable existing persistent disk? #251
Comments
By For persistent and existing disks, dsub always mounts this read-only (e.g. https://github.com/DataBiosphere/dsub/blob/main/dsub/providers/google_v2_base.py#L721). We do briefly mention that resource data can be mounted read-only: https://github.com/DataBiosphere/dsub#mounting-resource-data The idea is that these disks contain resource data to be used as inputs for your workflows, so it made sense for dsub to mount as read-only. Would you mind describing your use case for mounting these disks as writable? |
Yep that's what I was referring to in regards to I would heartily agree that it almost always makes sense for dsub to mount as read-only for resource/reference data. I was chaining together an analysis that could benefit from having a shared persistent disk for performance where one step collectively operated on a set of large files and then executed a second step that used the combined result from the first step and each individual file. The second step of course is very easily run in parallel using dsub. Since I have to copy all of the files out of GCS for the first step, using a persistent disk here would save having to copy the files twice out of storage. Not a particularly lengthy operation I know (though it is a sizable amount of data), but its nice if you don't have to do it multiple times. And there are more downstream steps on the whole and individual results from the first two steps. It is also useful to be able to have all the data on the one persistent disk should something invariably go wrong during the development/processing. But the real reason was a desire to make setting up the resource/reference disk part of the whole process to make it easier to maintain, e.g create the disk and populate it with a dsub job. Run the combined step in the next dsub job. Then do the multiple runs with a dsub tasks job. And sure I could resort to using a workflow orchestrator (nextflow, snakemake, etc) for doing this but it is a pretty straightforward couple of steps. |
I've had some good successes with mounting existing read-only persistent disks to the VM running a dsub job, and its very cool that one can do this. However I was wondering about attached writable disks. According to the Life Science API documentation:
I'm not exactly sure what they mean by
Mount references
. Do they mean that the disk is attached to zero or more VMs in read only mode? As that would seem to be what is implied by the description. (I'm not sure how outside of the VM that GCP would explicitly know how the disk is actually mounted). I've done some testing with a persistent disk that's unattached to any VMs, and one that was already attached in read only mode to a VM and in either case when I launch a dsub job the persistent disk is always attached in read only mode regardless.The text was updated successfully, but these errors were encountered: