-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to cache the bpe when using num_tokens_from_messages
#81
Comments
As an aside note for perf the singleton models having a mutex around them seems counterproductive. Generally, with model code you want your model to either be:
Having mutexes around the singletons seems to drastically limit their utility given that the tokenisers are running on the CPU and not that complex. Also for benchmarking you should look at criterion or divan, the rust benchmarking stuff won't be stable and provides less useful statistics/measurements |
For some benchmarking results to back things up. We took the impl of
Using the version in this library we get:
|
I can reproduce the performance gains of using a pre-initialized BPE. In fact, it would be possible to implement something like a |
We've found that the
num_tokens_from_messages
function is a significant bottleneck in our application and from some benchmarking found that it seems to load the bpe every time it's called and this dominates the runtime. Is there a way to cache this so we can load it once and not take the pain of loading it each time or will it require a change to tiktoken-rs' internals (and if so what does the change have to be?)Section of a flamegraph showing the breakdown of
num_tokens_from_messages
The text was updated successfully, but these errors were encountered: