You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As a user of s3 scan, I have a bucket with 100 million objects. The current s3 scan source is not able to handle this many objects, as it is bottlenecked by returning all objects as a list of partitions in the supplier, which can lead to out of memory errors. Additionally, if there are any failures in s3 scan supplier, no partitions will get created because all partitions are returned from the supplier before they are created in the coordination store.
Describe the solution you'd like
I would like the PartitionSupplier functions to be able to pass partitions back to the source coordinator for creation. So as objects are found during a scan, instead of holding them all in memory, the call to create the partition would be made right after the object is found from scanning.
Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
graytaylor0
changed the title
S3 Scan Source partition supplier creates partitions in memory
S3 Scan Source partition supplier creates partitions in memory and a failure causes no partitions to be created
Jun 6, 2024
Is your feature request related to a problem? Please describe.
As a user of s3 scan, I have a bucket with 100 million objects. The current s3 scan source is not able to handle this many objects, as it is bottlenecked by returning all objects as a list of partitions in the supplier, which can lead to out of memory errors. Additionally, if there are any failures in s3 scan supplier, no partitions will get created because all partitions are returned from the supplier before they are created in the coordination store.
Describe the solution you'd like
I would like the PartitionSupplier functions to be able to pass partitions back to the source coordinator for creation. So as objects are found during a scan, instead of holding them all in memory, the call to create the partition would be made right after the object is found from scanning.
Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: