-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardises GTDB execution and allow pre-uncompressed GTDB input #477
Conversation
|
Need to test, currently GTDB is not being executed at all; |
First pass tests:
Remember: remove print and dumps! |
TODO: same for BUSCO |
Change my mind, BUSCO is a little more tricky, will do that in a follow up PR (if I do more than just accepting directory input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look fine to me, but don't have time (or a downloaded copy of the GTDB database) to test with myself!
Working: but broken database
Can I check what this means?
It was saying something about couldn't find `tigram database' or something like, regardless if I user the auto-downloaded or manual download gtdb release 202 🤷 unless the error message was a bit funny and it meant it couldn't find features within the data maybe But regardless the module definitly executed and got halfway through :) |
Poking around - the r202 database release is listed with a maximum GTDB version compatibility of 1.7.0 - the version in mag dev at the moment is 2.1.1. Worth retrying with the R214 or R207v2 release, which are listed as compatible? A little bit of googling suggests Tigram is some kind of HMM database - maybe this has since been added to the GTDB database. Might be we have to bump the default database version as well. |
Huh interesting... then will check that later, I guess somehow the module got updated at some point but not te URL? |
Thansk for the investgiation :D (will likely finish next week though as teaching all this week) |
@jfy133 @prototaxites I think it would be great to get the r214 database set as default, since it is a pretty significant increase over r207. I'm happy to work on adding updating that this week if that would be helpful! |
That would be great @CarsonJM ! I honestly think it will just take updating the URL in the If you could also test on your own (small) data for the the database cases: auto download, supply as a tar.gz, and also an unpacked tar (i.e. Directory) that would be also really helpful. The database is too large for the GitHub CI nodes so I fear it's not tested sufficiently :(, thusb the more manual tests the better. We should also maybe consider a release checklist to also run a range of e.g full AWS run/local HPC run with all the large databases activated... |
Thanks for the guidance @jfy133 I will work on that this week and keep you all posted! |
Sorry for falling behind on this. I made the code changes and started some tests last Thursday, but sorely underestimated the amount of time I would need to request for downloading/unpacking this database. Re-running the tests now! |
Finished running all three tests (auto-download, .tar.gz, and directory) all worked great and it looks like all CI tests are going to pass as well. One thought on this would be to add |
Thanks for adding that @CarsonJM ! Why do you think that process needs lots of memory? Isn't it just running |
Good point! @jfy133 I initially ran this without modifying the resource request, and it failed after > 4hrs (our default queues max runtime). When I requested 16 threads and 100GB mem, it completed in 40 min. After your comment I looked at the trace and mem requirement is definitely low! Would it be running faster because more cores were available? Trace below:
|
AFAIK it's also single core... so I have no idea. Maybe the first time the node you were sent had lots of RAM intensive jobs going on? I didn't have that issue myself personally (40m every time) when running on my laptop, so I'm more inclined to leave it as is for now? Otherwise, if you're happy with the PR @CarsonJM please give the ✔️ and then we can merge, and dare I say it, make the release? |
To close #424
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).