Improve canonicalization performance #3119

ddeschepper · 2025-04-25T10:57:11Z

We're noticing big performance issues when using longturtle serialization on some graphs. I've been able to narrow this down to the performance of canonicalization, which is also tracked in issue #2528.

Looking into it I found that the current implementation of the _traces method of _TripleCanonicalizer causes much of the performance impact.

This PR reduces the complexity of _traces, which leads to a performance gain of at least an order of magnitude in our worst cases (100s -> 4s). All rdflib tests still pass, and additionally, I've tested these changes with our set of a few hundred examples that are longturtle serialized, which causes no changes in the serialization output.

The author of the linked issue has created a performance test that, with the current code, gives the following results on my machine:

file: test1.ttl
isomorphic: 0.07787537574768066
canonical: 0.03909921646118164

file: test2.ttl
isomorphic: 3.528538942337036
canonical: 1.8337273597717285

file: test3.ttl
isomorphic: 20.140648365020752
canonical: 9.535402774810791

where my new version results in:

file: test1.ttl
isomorphic: 0.012566566467285156
canonical: 0.006159543991088867

file: test2.ttl
isomorphic: 0.15960264205932617
canonical: 0.09874987602233887

file: test3.ttl
isomorphic: 0.531768798828125
canonical: 0.2606654167175293

edmondchuc · 2025-05-07T05:12:48Z

rdflib/compare.py

-                best = [refined_coloring]
+            color_score = tuple(c.key() for c in refined_coloring)
+
+            if best_score is None or best_score < color_score:


The right-hand side of the or is never evaluated as best_score is None is always true here. Can you please review this.

reduce _traces complexity

312138d

ddeschepper mentioned this pull request Apr 25, 2025

Performance issues with rdflib.compare #2528

Open

edmondchuc mentioned this pull request May 7, 2025

7.x canonicalization perf #3135

Draft

8 tasks

edmondchuc reviewed May 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve canonicalization performance #3119

Improve canonicalization performance #3119

ddeschepper commented Apr 25, 2025

edmondchuc May 7, 2025 •

edited

Loading

Improve canonicalization performance #3119

Are you sure you want to change the base?

Improve canonicalization performance #3119

Conversation

ddeschepper commented Apr 25, 2025

edmondchuc May 7, 2025 • edited Loading

Choose a reason for hiding this comment

edmondchuc May 7, 2025 •

edited

Loading