Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type Mismatching while Serializing Dataclass with Union #2859

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

mao3267
Copy link
Contributor

@mao3267 mao3267 commented Oct 24, 2024

Tracking issue

Closes flyteorg/flyte#5910

Why are the changes needed?

While converting dataclasses from Python types to literal types through the to_literal function, the dataclass will pass through the _make_dataclass_serializable function to ensure it can be serialized later. For dataclasses with Union properties, under the if UnionTransformer.is_optional_type(python_type): statement, it uses get_args(python_type)[0] as the type of the property without verifying its compatibility. This will cause an error if the first argument is not the expected type.

What changes were proposed in this pull request?

  1. Do typing matching instead of using the first type from get_args(python_type)
  2. Add unit tests for more possible union types

How was this patch tested?

  1. Run on local and remote execution
  • Union[None, FlyteFile]
    flytefile_serialize.py
from dataclasses import dataclass
from flytekit import task
from flytekit.types.file import FlyteFile
from typing import Union
from flytekit.image_spec import ImageSpec

flytekit_hash = "f647bd3a1b727082210454c3f5d1e652a29b1083"
flytekit = f"git+https://github.com/mao3267/flytekit.git@{flytekit_hash}"

image_spec = ImageSpec(
    name="serialize-union-1",
    packages=[flytekit],
    builder="default",
    registry="localhost:30000",
    apt_packages=["git", "gh"],
)

@dataclass
class InnerDC:
    ff: Union[None, FlyteFile]

@dataclass
class DC:
    inner_dc: InnerDC

@task(container_image=image_spec)
def t_dc() -> DC:
    return DC(inner_dc=InnerDC(ff="s3://path"))
  • Union[None, int, str]
    serialize_union.py
from dataclasses import dataclass
from flytekit import task
from typing import Union
from flytekit.image_spec import ImageSpec

flytekit_hash = "f647bd3a1b727082210454c3f5d1e652a29b1083"
flytekit = f"git+https://github.com/mao3267/flytekit.git@{flytekit_hash}"

image_spec = ImageSpec(
    name="serialize-union-1",
    packages=[flytekit],
    builder="default",
    registry="localhost:30000",
    apt_packages=["git", "gh"],
)

@dataclass
class InnerDC:
    ff: Union[None, int, str]

@dataclass
class DC:
    inner_dc: InnerDC

@task(container_image=image_spec)
def t_dc() -> DC:
    return DC(inner_dc=InnerDC(ff="string"))
  1. Add unit tests in test_type_engine.py

Setup process

git clone https://github.com/flyteorg/flytekit.git
gh pr checkout 2859
pip install -e .

Screenshots

  • Local
    • Union[None, FlyteFile]
      image
    • Union[None, int, str]
      image
  • Remote
    • Union[None, int, str]
      image

    • Union[None, int, str]
      image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

None

Docs link

None

@mao3267 mao3267 marked this pull request as ready for review October 24, 2024 12:01
@mao3267 mao3267 changed the title [WIP] Type Mismatching while Serializing Dataclass with Union Type Mismatching while Serializing Dataclass with Union Oct 24, 2024
@@ -967,6 +967,7 @@ class TestFileStruct(DataClassJsonMixin):
b: typing.Optional[FlyteFile]
b_prime: typing.Optional[FlyteFile]
c: typing.Union[FlyteFile, None]
c_prime: typing.Union[None, FlyteFile]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this

Suggested change
c_prime: typing.Union[None, FlyteFile]
c_prime: typing.Union[None, StructuredDataset, int, FlyteFile]

Copy link
Contributor

@wild-endeavor wild-endeavor Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually can you write one more unit test for me please? (and add it under test_dataclass.py this file is getting too big).

@dataclass
class A():
  x: int

@dataclass
class B():
   x: FlyteFile

then call _make_dataclass_serializable on Union[None, A, B] where b = B(x="s3://tmp) or something.

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 81.81818% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.15%. Comparing base (3fc51af) to head (4894ae5).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/core/type_engine.py 81.81% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2859       +/-   ##
===========================================
+ Coverage   45.53%   79.15%   +33.61%     
===========================================
  Files         196      196               
  Lines       20418    20545      +127     
  Branches     2647     2647               
===========================================
+ Hits         9298    16262     +6964     
+ Misses      10658     3546     -7112     
- Partials      462      737      +275     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


def get_expected_type(python_val: T, types: tuple) -> Type[T | None]:
if len(set(types) & {FlyteFile, FlyteDirectory, StructuredDataset}) > 1:
raise ValueError("Cannot have two Flyte types in a Union type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the error

Cannot have more than one Flyte type in the Union when attempting to use the string shortcut. Please specify the full object (e.g. FlyteFile(...)) instead of just passing a string.

instead?

Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! left one more comment to update the error message. otherwise good. thank you.

@@ -967,6 +967,7 @@ class TestFileStruct(DataClassJsonMixin):
b: typing.Optional[FlyteFile]
b_prime: typing.Optional[FlyteFile]
c: typing.Union[FlyteFile, None]
c_prime: typing.Union[None, FlyteFile]
Copy link
Contributor Author

@mao3267 mao3267 Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we didn’t expect typing.Union[None, StructuredDataset, int, FlyteFile] to work in our tests due to multiple FlyteTypes ambiguity, we anticipated something like typing.Union[None, int, FlyteFile] might function correctly. However, after tracing the code, I discovered that there are some bugs while msgpack handling unions with more than two types. cc @Future-Outlier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] [Flytekit] Mismatching Type while serializing Union Types in _make_dataclass_serializable
2 participants