Skip to content

Improve canonicalization performance #3119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ddeschepper
Copy link

We're noticing big performance issues when using longturtle serialization on some graphs. I've been able to narrow this down to the performance of canonicalization, which is also tracked in issue #2528.

Looking into it I found that the current implementation of the _traces method of _TripleCanonicalizer causes much of the performance impact.

This PR reduces the complexity of _traces, which leads to a performance gain of at least an order of magnitude in our worst cases (100s -> 4s). All rdflib tests still pass, and additionally, I've tested these changes with our set of a few hundred examples that are longturtle serialized, which causes no changes in the serialization output.

The author of the linked issue has created a performance test that, with the current code, gives the following results on my machine:

file: test1.ttl
isomorphic: 0.07787537574768066
canonical: 0.03909921646118164

file: test2.ttl
isomorphic: 3.528538942337036
canonical: 1.8337273597717285

file: test3.ttl
isomorphic: 20.140648365020752
canonical: 9.535402774810791

where my new version results in:

file: test1.ttl
isomorphic: 0.012566566467285156
canonical: 0.006159543991088867

file: test2.ttl
isomorphic: 0.15960264205932617
canonical: 0.09874987602233887

file: test3.ttl
isomorphic: 0.531768798828125
canonical: 0.2606654167175293

best = [refined_coloring]
color_score = tuple(c.key() for c in refined_coloring)

if best_score is None or best_score < color_score:
Copy link
Contributor

@edmondchuc edmondchuc May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right-hand side of the or is never evaluated as best_score is None is always true here. Can you please review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants