Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ParquetToArrowDecodingHandler fails to resolve s3 paths correctly #5859

Open
2 tasks done
yherin opened this issue Oct 18, 2024 · 1 comment
Open
2 tasks done
Assignees
Labels
bug Something isn't working

Comments

@yherin
Copy link

yherin commented Oct 18, 2024

Describe the bug

Due to (I believe) a downstream bug in pyarrow, when ParquetToArrowDecodingHandler tries to resolve a path to a parquet file, it fails with a FileNotFoundError.

It seems that the s3:// prefix is not passed to the final step of reading the parquet file, which results in pyarrow trying to load the file from a local path which does not exist.

apache/arrow#31812
The issue contains a description of where in pyarrow the issue occurs.

User workaround:
Use another type that interfaces well with parquet e.g. Polars and its flyte plugin.

Expected behavior

Parquet files can be decoded successfully with ParquetToArrowDecodingHandler even from s3

Additional context to reproduce

Depedencies:

flyteidl==1.13.4 ; python_full_version == "3.10.11"
flytekit==1.13.8 ; python_full_version == "3.10.11"
fs==2.4.16 ; python_full_version == "3.10.11"
fsspec==2024.9.0 ; python_full_version == "3.10.11"
fsspec[s3fs]==2024.9.0 ; python_full_version == "3.10.11"
pyarrow==16.1.0 ; python_full_version == "3.10.11"
  1. Use an s3 backend in flyte
  2. Define task A which returns a pyarrow.Table object
  3. Define task B which accepts that same object as an input
  4. Run the workflow, observe that a FileNotFoundError occurs.

Remark: In the error message, the s3:// prefix is missing from the erroneous path

Screenshots

ss1 ss3 ss2

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@yherin yherin added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 18, 2024
Copy link

welcome bot commented Oct 18, 2024

Thank you for opening your first issue here! 🛠

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label Oct 24, 2024
@eapolinario eapolinario self-assigned this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog
Development

No branches or pull requests

2 participants