Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Renaming Components and Adding Branches to Workloads Repository #324

Open
IanHoang opened this issue Jun 6, 2023 · 6 comments
Open
Assignees
Labels
enhancement New feature or request RFC Request for comment on major changes

Comments

@IanHoang
Copy link
Collaborator

IanHoang commented Jun 6, 2023

Synopsis

This is an RFC for a proposal to improve the nomenclature of several components within OpenSearch Benchmark to make them conform to standard terminology, for better readability and ease of maintenance. Although renaming components might seem like a small change, the proposed replacements will impact users using legacy versions of OSB (across at least eight minor versions of OSB). This RFC addresses our motivation, proposes suitable replacements, and recommends ways to mitigate inconveniences brought on by these replacements.

Meta Tasks: #325


Motivation

Leading up to the release of OSB 1.0.0, members of the community have spent time identifying and resolving various issues across OSB’s code base. When members dove into the code base to understand different components, many found that a handful of components were too verbose, inconsistently formatted, lacked clarity, or were no longer appropriate in their context.

In addition to this, community members have received recurring questions and noticed confusion regarding these components — what they mean, how they work, and interact, etc. Since OSB is still in an early state in its development, we propose that the community finalize on suitable replacements and rename these components as soon as possible. It would be better to rename these components now rather than later, when additional features and workloads will have been incorporated into OSB. Additionally, we can take this time to determine if other major components should be renamed.


Recommendations

We have identified a list of OSB components that are in question. Feel free to add others components and provide your rationale behind why and what they should be renamed to.

Naming Conventions: if we have to use underscores (_) in customer facing components, any areas that are exposed to customers should be hyphenated (-). Literals within the codebase are restricted to using underscores by the programming language syntax.

  • execute-test, test_executions
    • Rename to run, test_runs.
    • Rationale: We would like to avoid the term execute. Start was another contender. However, using start implies that users need to run stop afterwards, which does not apply to OSB since it stops automatically after every test.
  • test_procedures
    • Rename to scenarios
    • Rationale: Scenarios might be a better alternative because by definition its singular form means a sequence of events, which is in line with the official definition of test_procedures.
      • Users may want to use the same workload but perform the operations in a different order. Instead of creating a new workload or reorganizing the order of operations directly, you can provide test_procedures to vary workload operations
  • provision_configs, provision_config_instances
    • Rename to cluster_configs.
    • Rationale: We should describe it as it is and there is no simpler form to represent cluster configurations than cluster_configs. It is less verbose.
  • results_publishing
    • Rename to reporting
    • Rationale: When referring to the activity of reporting anything, it’s more common to use the term reporting rather than using publishing.
  • load_worker_coordinator_hosts, node-ip, coordinator-ip, (distributed workload generation)
    • Rename to worker-hosts to align with coordinator-hosts and target-hosts.
    • Rationale: Simplifies the parameter. Originally suggested worker-ips but @gkamat brought up a good point that this feature should be able to take hosts and ips (currently OSB only takes IPs with this parameter unlike its —target-hosts parameter which takes both). We will also need to change node-ip and coordinator-ip to node-host and coordinator-host to imply that it takes in both ips and hosts, allowing them to behave similar to —target-hosts parameter

Avoiding Breaking Changes with Legacy Versions and New Versions of OSB

Some component names in the OSB repository also live in the workloads repository. Because of this, if any are altered, there will be breaking changes in the workloads repository. Once we have finalized the names and replaced them in the OSB repository, we will need to come up with a way for OSB users to be able to seamlessly use legacy versions (0.0.1 to 1.0.0 or any versions before the changes have been implemented) without encountering issues. Of all these options, we are leaning towards option 1.

1. Adding More Branches

Add new branches containing updated changes while preserving the original branches. By adding more branches, we keep the legacy formats and updated formats in the same repository, making it much easier to manage and, eventually, deprecate the legacy branches in the future. The branches are currently named 1,2,3,6,7 and refer to the first three versions of OpenSearch and versions 6 and 7 of Elasticsearch. To distinguish the new branches from the legacy branches, the new branches will follow the naming convention - (e.g. OS-1 and OS-2 for OpenSearch versions 1.X and OpenSearch versions 2.X). Additionally, this new naming convention will clear up any previous confusion users had with what 1,2,3,6,7 represented in the workloads repository.

The only drawbacks we can see with this idea is that there will be an excess amount of branches. Despite this, managing excess branches for a short period of time is still more appealing than maintaining an extra repository, additional layer of directories, or forcing all OSB users to upgrade to the latest version.

2. Creating a Separate Repository (legacy-workloads)

Create a new repository with the new changes. However, this would also be a nuisance to deal with as we’d have an additional repository to maintain for a short period of time as we would eventually deprecate the repository with the legacy formats.

3. Creating distinct directories in workloads

Adds an additional layer between the workload name and its contents. When users have OSB installed on their machine, the path to get to the contents of each workload is already long and adding an additional layer would make it even more cumbersome.

4. Deprecating all versions prior to newly-released version with these changes.

This is another option but the least effective, as it forces all users of OSB to upgrade.


The changes proposed above intend to be incorporated prior to the next major release (2.X). It's important that these changes occur before other major features are built on top of these pre-existing components. Although they pose to be a brief inconvenience for some, the proposed changes will benefit the long term vision for OpenSearch Benchmark and the OpenSearch community.

We are looking forward to your feedback and support for this proposal.

How Can You Help?

  • Any general comments about the overall direction are welcome.
  • Help out on the implementation! Check out the issues page for work that is ready to be picked up.
@IanHoang IanHoang added enhancement New feature or request RFC Request for comment on major changes labels Jun 6, 2023
@IanHoang IanHoang self-assigned this Jun 6, 2023
@IanHoang IanHoang removed the untriaged label Jun 6, 2023
@IanHoang IanHoang removed their assignment Oct 18, 2023
@IanHoang
Copy link
Collaborator Author

Meta issue #325

@IanHoang
Copy link
Collaborator Author

Maintainers have had a discussion regarding renaming components. We have created a document discussing plan and timeline for 2.0.0. Community members are welcome to read and comment on this RFC for any ideas or additional proposals. We will finalize on final list of items to rename and update this RFC and META Issue tracker next Wednesday.

@andrross
Copy link
Member

I am personally very supportive of simplifying and standardizing terms in the OSB interface. Assuming you implement this and release a 2.0.0 major version, how will you ensure that conventions and standards are followed going forward for new features? (I don't have any great answers for you so please do share any mechanisms you come up with!)

@IanHoang
Copy link
Collaborator Author

@andrross To ensure that we adhere to new conventions and standards, here are a few ideas:

  • Support both legacy terminologies and renamed terminologies for a period of time. For example, a user is using OSB 2.0.0 and accidentally references a legacy term, such as test procedure, in the CLI. Under the hood, OSB should have a "symlink" that detects legacy term that were used and connects them to the renamed term's pathways. We can have this supported for a period of time and eventually post an announcement on public Slack / Documentation / Github that shows what date we will discontinue support for legacy terminologies. Afterwards, anytime the user tries to use legacy terms in the command line, OSB will produce an error and suggest the user to use the reformed terms.
  • As a slight alternative to above, OSB can automatically detect the legacy term, suggest the renamed version in the command line, and prompt the user with a y/n instead to confirm if that's what they intended. That way we don't force the user to use it if that's not what they intended. With enough exposure to OSB suggesting the reformed terms, users will learn to use the reformed terms over legacy terms. To illustrate this suggestion mechanism, here is an example output for a user who might use --test-procedure instead of --scenario (the renamed version of test procedure):
$ opensearch-benchmark execute-test --test-procedure="searches-only" ....
> Warning: --test-procedure is not a valid parameter
> Did you mean --scenario? (y/n)
  • Workloads only specify one legacy terminology, which is test-procedures. We'll add a check to ensure that the terminologies found in the workload files match with what terms the OSB version expects (e.g. if users use 2.X, OSB will check to verify that the workload files read are using the updated terminologies. If not, throw an error and point to which file with a suggestion to correct).
  • Many of our users continuously reference our documentation on OpenSearch.org. By the time we release 2.0.0, we're aiming to release updated documentation that gives users the ability to toggle between versions (i.e. users can toggle between 1.X and 2.X versions of documentation)

These are just some ideas but if you or anyone else have any ideas or comments, we're open to hearing them!

@andrross
Copy link
Member

@IanHoang I'm talking more about ongoing development. Let's say a contributor comes along and adds a new feature that uses the phrase "execution" as a part of a CLI option for that feature. If you specifically review that change then you will almost certainly work to use the phrase "run" instead. How can you ensure that all such changes get the right level of scrutiny to make sure the terminology in use remains consistent?

@IanHoang
Copy link
Collaborator Author

IanHoang commented Jun 20, 2024

@andrross To ensure that the renamed terminology remains consistent in ongoing development, maintainers and reviewers will need to review the naming conventions thoroughly to ensure that contributors aren't reverting back to the legacy terminologies. Luckily, OSB doesn't have an extensive list of unique terminologies like test procedures or test executions, so it shouldn't be too daunting. Upholding code review best practices (ensure that the PR changes are healthy sizes, inspect naming, check functionality, etc.) would also improve our ability to catch moments when users use legacy terms.

One idea to improve this and to make the effort a little less manual is to leverage a custom Github Actions. This Github Actions would search and comment on the legacy terminologies found in PRs (similar to how style-job Github Actions is used on PRs in the documentation repository). However, this would only detect when legacy terminologies are used and doesn't detect situations where users are referring to the renamed term but in a completely new way. The only way I can imagine catching those situations is through a thorough review.

Although it's not a perfect solution, a combination of those two (automated Github Actions + maintainers and contributors following PR best practices) should help ensure that the renamed terminology remains consistent.

@IanHoang IanHoang pinned this issue Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RFC Request for comment on major changes
Projects
Status: In Progress
Development

No branches or pull requests

3 participants