-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how: use DVC when data is stored in an external drive #563
Comments
This comment has been minimized.
This comment has been minimized.
These are related/solve similar problems: #455 (fixes #103 ) Keep in mind: |
This comment has been minimized.
This comment has been minimized.
The solution described by @efiop (tracking a data file that is external (outside the dvc project)) seems to be a different solution. Having a remote DVC cache (same as multiple-users-on-a-single-machine) is another solution. The NFS case seems to have a similar solution to multiple-users-on-a-single-machine. |
@dashohoxha gotcha. This is a different one indeed. This sections - https://dvc.org/doc/user-guide/external-outputs and this one https://dvc.org/doc/user-guide/external-dependencies should be reorganized/taken into account. Also, keep in mind. My take on this that there should be a very strong reason to complicate your workflow with external deps/outs/cache in case of multiple drives. As I mentioned on Discord, I think in most cases the ideal scenario is to use external cache and symlinks (similar to NFS, shared cache scenarios). |
They seem accurate to me (unless there is some missing information that I don't know). |
@dashohoxha your PR looks good, there are some improvements can be done which I'll review and let you know, but first I would like to understand the "use case" itself better, what are possible solution for that "use case", how should we improve those sections in User Guide, how all this stuff corresponds wish the shared machine case (when there is a single cache setup on a separate partition). Without this holistic plan, we are potentially duplicating information, we are not properly communicating the use case, and we are not properly structuring User Guide. To give just some concerns:
Some better titles from the top of my head:
So, let's please, discuss and understand some strategy behind this. @jorgeorpinel would love to hear your opinion on this. |
Yes! It's funny because I've been noticing significant confusion around external X topics so I opened #566 recently. I also feel like we may need to regroup and figure out the connections between all the external data stuff before deciding which docs to change. That said it's good to have more use cases and I'll review the PR but if we don't figure out the big picture, this doc may only add to the confusion of some users, like Dashamir mentioned in #563 (comment). |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I think that at this point it's unclear whether a how-to is needed and most of the content will be covered by #520? Can we close this @shcheklein ? Thanks |
My take on this in general is that you have 4 routes when working with data from external drives:
Other than # 4 which we probably don't need to document, we have info. about all of this in docs. We may just need to consolidate it somewhere in the future Data Management guides. I added bullet there and with that and #520 I think we should close this as redundant. |
Should we repurpose this ticket to focus specifically on managing external data on NAS? @shcheklein More details in https://discuss.dvc.org/t/setup-dvc-to-work-with-shared-data-on-nas-server/180 (top forum question) |
We made updates to the guide about external data as part of the 3.0 release, so closing since I don't see additional actions we can take right now. Feel free to reopen if I missed something. |
E: Check whether #520 was done first... See also #899
This doc should explain the best solution (or a couple of possible solutions) for this situation.
Example: the data is located in a partition of size 16TB on an external drive, while the DVC project is on
/home
of a partition of size 320GB.The text was updated successfully, but these errors were encountered: