Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors getting raw url as part of RO Bundle for not GitHub / GitLab repos #488

Open
mr-c opened this issue Dec 31, 2022 · 8 comments
Open

Comments

@mr-c
Copy link
Member

mr-c commented Dec 31, 2022

{'url': 'https://gitlab.bsc.es/lrodrig1/structuralvariants_poc.git', 'branch': '1.0.7', 'path': 'structuralvariants/cwl/subworkflows/bwa_index.cwl'}

2022-12-31 16:30:23,369 ERROR [task-4] org.commonwl.view.researchobject.ROBundleService: Could not pack workflow when creating Research Object: While fetching https://gitlab.bsc.es/lrodrig1/structuralvariants_poc.git, got content-type of 'text/html'. Expected one of ['text/plain', 'application/json', 'text/vnd.yaml', 'text/yaml', 'text/x-yaml', 'application/x-yaml', 'application/octet-stream'].
ERROR Tool definition failed validation:
https://gitlab.bsc.es/lrodrig1/structuralvariants_poc.git:5:17: mapping values are not allowed here

org.commonwl.view.cwl.CWLValidationException: While fetching https://gitlab.bsc.es/lrodrig1/structuralvariants_poc.git, got content-type of 'text/html'. Expected one of ['text/plain', 'application/json', 'text/vnd.yaml', 'text/yaml', 'text/x-yaml', 'application/x-yaml', 'application/octet-stream'].
ERROR Tool definition failed validation:
https://gitlab.bsc.es/lrodrig1/structuralvariants_poc.git:5:17: mapping values are not allowed here

	at org.commonwl.view.cwl.CWLTool.runCwltoolOnWorkflow(CWLTool.java:121)
	at org.commonwl.view.cwl.CWLTool.getPackedVersion(CWLTool.java:60)
	at org.commonwl.view.researchobject.ROBundleService.createBundle(ROBundleService.java:204)
	at org.commonwl.view.researchobject.ROBundleFactory.createWorkflowRO(ROBundleFactory.java:80)
	at org.commonwl.view.researchobject.ROBundleFactory$$FastClassBySpringCGLIB$$c15d1fdc.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:793)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:763)
	at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Due to not detecting that https://gitlab.bsc.es is a GitLab based host

default:
return repoUrl;

A harder example: {'url': 'https://git.wur.nl/unlock/cwl.git', 'branch': 'master', 'path': 'cwl/workflows/workflow_indexbuilder.cwl'} (also GitLab based host)

This raw URL is needed to pack the workflow; why aren't we using the local git checkout?

addAggregation(bundle, manifestAnnotations,
"merged.cwl", cwlTool.getPackedVersion(rawUrl));

@mr-c
Copy link
Member Author

mr-c commented Dec 31, 2022

Perhaps GitLab style hosts could be detected by a well known path or API?

@kinow
Copy link
Member

kinow commented Dec 31, 2022

Perhaps GitLab style hosts could be detected by a well known path or API?

Oh, that sounds like an interesting problem. Let me check if I can find a way to tell whether a URL is GitHub, BitBucket, or GitLab (I have an idea on how to find it 😬 )

@kinow
Copy link
Member

kinow commented Dec 31, 2022

Alright, my first idea flopped. I remembered that in Jenkins you could use GitLab, BitBucket, or GitHub. I thought they had already found a way to identify the server for a given URL, but looks like they only identify the cloud versions (i.e. github.com/, gitlab.com/, and bitbucket.org/*).

@kinow
Copy link
Member

kinow commented Dec 31, 2022

Second idea was to identify the repository based on refs. Pull requests generate a refs/pull/$ID, and merge requests generate something like refs/merge-requests/$ID. I think in bitbucket it's something else like refs/pull-requests/$ID.

But if you have pull/merge requests disabled, or if you have no open requests, then I believe the git client won't list anything. I had a look at git show-ref but couldn't find a way to rely on refs to identify the repo.

Maybe we could query for commits?

In a GitHub repository, the URL will be something like: https://<host>/<org>/<repo>/commits/master. In a GitLab repository that will be https://<host>/<org>/<repo>/-/commits/master. In BitBucket it's https://<host>/<org>/<repo>/commits/branch/master. So in theory a curl -I and a check for status 200 could be used to identify the server type of a given repository URL?

I think the logic would be to first check the host name for GitHub.com, GitLab.com, or BitBucket.org. If that fails, then curl for this commits URL. Finally throw an error as we couldn't identify the server type.

WDYT @mr-c?

@mr-c
Copy link
Member Author

mr-c commented Dec 31, 2022

WDYT @mr-c?

Github is only github.com (I know of no other public installations); likewise for bitbucket. Therefore it is just self-hosted GitLab that needs detecting; so maybe try https://hostname/api/v4/projects (which doesn't require a token) and use a valid response as an indicator?

@kinow
Copy link
Member

kinow commented Dec 31, 2022

Github is only github.com (I know of no other public installations);

At NIWA we thought about the enterprise option, but it was too expensive at the time (some of the code was for-profit, or mixed). Unis and not for profit had a special price for the enterprise if I recall? I remember we had access to the silver plan as we were a research institution (at that time only silver gave private repos, nowadays everybody has access to it), so that's what we got.

Now I think besides the big FAANG companies, not for profit and some unis might have github enterprise installed, e.g.

But I think we can skip it and implement it later if needed, especially as I am not sure if any of these unis host public repositories (in GitLab I can choose whether my projects are public/private/internal, no idea about github enterprise).

likewise for bitbucket.

A telco I worked with briefly in New Zealand used the BitBucket server (I think other NZ companies used it due to Atlassian being from Aus - the complete Atlassian suite with confluence/jira/bitbucket/bamboo/etc wasn't very expensive some time ago).

Therefore it is just self-hosted GitLab that needs detecting; so maybe try https://hostname/api/v4/projects (which doesn't require a token) and use a valid response as an indicator?

Oh, using the API is a good idea, didn't think about that one. Could be too. Not sure if everybody is on v4. I assume when a v5 is available, v4 will keep working too (? or maybe users can enable/disable older api versions?), so this might be a good idea.

https://mmb.irbbarcelona.org/gitlab/gelpi/CMIP a public GitLab project, but the https://mmb.irbbarcelona.org/gitlab/api/v4/projects redirects to the sign up page. Now, if you try the v3... that works 🤔

I think they may be using an older version of GitLab? Note, however, that v2 does not work 😄

My URL /-/commits/master also appears to be V4 only, as that also doesn't work for that irbbarcelona.org repo 😄

@mr-c
Copy link
Member Author

mr-c commented Jan 1, 2023

TIL! I thought all hosted or on-premise GitHub/Bitbucket services were private

Oh, using the API is a good idea, didn't think about that one. Could be too. Not sure if everybody is on v4. I assume when a v5 is available, v4 will keep working too (? or maybe users can enable/disable older api versions?), so this might be a good idea.

https://mmb.irbbarcelona.org/gitlab/gelpi/CMIP a public GitLab project, but the https://mmb.irbbarcelona.org/gitlab/api/v4/projects redirects to the sign up page. Now, if you try the v3... that works thinking

I think they may be using an older version of GitLab? Note, however, that v2 does not work smile

My URL /-/commits/master also appears to be V4 only, as that also doesn't work for that irbbarcelona.org repo smile

There could be enough signal even in "failed" attempts. For example curl -v https://mmb.irbbarcelona.org/gitlab/api/v4/projects show that a _gitlab_session cookie is set, even though it redirects

For GitLab detection, I suggest trying a variety of endpoints, checking for a gitlab cooke, valid response, or other signal;

@kinow
Copy link
Member

kinow commented Jan 1, 2023

For GitLab detection, I suggest trying a variety of endpoints, checking for a gitlab cooke, valid response, or other signal;

Sounds good to me! We can then iterate and improve based on used feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants