Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a few additional failures to our notes doc #8980

Open
wants to merge 2 commits into
base: ah_var_store
Choose a base branch
from

Conversation

RoriCremer
Copy link
Contributor

@RoriCremer RoriCremer commented Sep 15, 2024

no automated testing needed--just documentation edits

@rsasch rsasch self-requested a review September 26, 2024 13:58
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some specific changes, also I think it would be useful to have a consistent way of attaching the failures to a specific workflow, sub-workflow and task for easier use.

scripts/variantstore/beta_docs/gvs-troubleshooting.md Outdated Show resolved Hide resolved
scripts/variantstore/beta_docs/gvs-troubleshooting.md Outdated Show resolved Hide resolved
scripts/variantstore/beta_docs/gvs-troubleshooting.md Outdated Show resolved Hide resolved
scripts/variantstore/beta_docs/gvs-troubleshooting.md Outdated Show resolved Hide resolved
scripts/variantstore/beta_docs/gvs-troubleshooting.md Outdated Show resolved Hide resolved
scripts/variantstore/beta_docs/gvs-troubleshooting.md Outdated Show resolved Hide resolved
@RoriCremer
Copy link
Contributor Author

at some point in our notes we added 1. It is important to verify that the data has ALL made it into the BQ dataset or not but i'm not sure how to do this with Beta users or if it makes any sense at all

@RoriCremer RoriCremer marked this pull request as ready for review October 18, 2024 04:41
Copy link
Collaborator

@mcovarr mcovarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but it's a little concerning there are so many failure modes that require manual intervention

Copy link
Collaborator

@kbergin kbergin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this took me SO long to review. Comments throughout.

1. Clean up the BQ dataset manually by deleting it and recreating it fresh
1. Make sure to keep the call caching on and run it again
1. Ingest failure with error message: `A USER ERROR has occurred: Cannot be missing required value for `___
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error message really "Cannot be missing required value for" ? Just seems like an improperly formed sentence

1. Ingest failure with error message: `A USER ERROR has occurred: Cannot be missing required value for `___
1. (e.g. alternate_bases.AS_RAW_MQ, RAW_MQandDP or RAW_MQ)
1. This means that there is at least one incorrectly formatted sample in your data model. Confirm your GVCFs are reblocked. If the incorrectly formatted samples are a small portion of your callset and you wish to just ignore them, simply delete the from the data model and restart the workflow without them. There should be no issue with starting from here as none of these samples were loaded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could link here that the full list of required information in a GVCF to work in GVS is here:
https://github.com/broadinstitute/gatk/blob/ah_var_store/scripts/variantstore/beta_docs/run-your-own-samples.md#gvcf-annotations

1. (e.g. alternate_bases.AS_RAW_MQ, RAW_MQandDP or RAW_MQ)
1. This means that there is at least one incorrectly formatted sample in your data model. Confirm your GVCFs are reblocked. If the incorrectly formatted samples are a small portion of your callset and you wish to just ignore them, simply delete the from the data model and restart the workflow without them. There should be no issue with starting from here as none of these samples were loaded.
1. Extract failure with OSError: Is a directory. If you point your extract to a directory that doesn’t already exist, it will not be happy about this. Simply make the directory and run the workflow again.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Extract failure with OSError: Is a directory. If you point your extract to a directory that doesn’t already exist, it will not be happy about this. Simply make the directory and run the workflow again.
1. Extract failure with OSError: Is a directory.
1. If you point your extract to a directory that doesn’t already exist, the workflow fails. Make the directory and run the workflow again.

I'm surprised this is an error the workflow can run into because our documentation suggests people get the workspace bucket and then add a subdirectory to that based on where they want the callset written. Usually that bucket doesn't actually already exist.. Am I misunderstanding your description of this error?

1. Ingest failure with: `Lock table error`
1. This means that the lock table has been created, but that the ingest has failed soon after or that perhaps during manual cleanup from another failure, some underlying data was deleted
1. The lock table can simply be deleted -- `sample_id_assignment_lock` -- and the ingest can be kicked off again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't talk about the lock table anywhere else in our documentation. I think this will need more detail to be clear to most people. Where is the lock table? How do they find it and delete it? If it picks up where ingest left off, let's reassure them of that here.

1. Extract failure with OSError: Is a directory. If you point your extract to a directory that doesn’t already exist, it will not be happy about this. Simply make the directory and run the workflow again.
1. Ingest failure with: `Lock table error`
1. This means that the lock table has been created, but that the ingest has failed soon after or that perhaps during manual cleanup from another failure, some underlying data was deleted
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the lock table hasn't been mentioned in any of our documentation, I'd reword this to something like

"This error is describing an issue found with one of the tables in the GVS BigQuery dataset called the 'lock table'. It indicates that the lock table has been created, but that the ingest failed soon after or that during manual cleanup from another failure, some underlying data was deleted and now the table is in a broken state."

@kbergin
Copy link
Collaborator

kbergin commented Oct 22, 2024

Generally, could you add sections for failures in the other high level steps of GVS? Or if GVS fails in any step besides ingest does the user need to start over from scratch? It'd be great to indicate that ie "If your workflow fails in any step after ingest, unfortunately you will need to delete your BigQuery dataset and start from the beginning." ?

1. Ingest failure: There is already a list of sample names. This may need manual cleanup. Exiting.
1. Clean up the BQ dataset manually by deleting it and recreating it fresh
1. Make sure to keep the call caching on and run it again
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like call caching specifically needs to be off for this to be successful, based on recent user error. I am going to bet that's true for both this error and the max id is 0 error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants