Skip to content

Feature Request: Implement Suffix Array and Suffix Tree #648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
arvinder004 opened this issue Mar 16, 2025 · 0 comments
Open

Feature Request: Implement Suffix Array and Suffix Tree #648

arvinder004 opened this issue Mar 16, 2025 · 0 comments

Comments

@arvinder004
Copy link
Contributor

arvinder004 commented Mar 16, 2025

A Suffix Array for a string S of length n is an array of integers in the range [0, n-1] that specifies the lexicographical order of all suffixes of S. It is often used in conjunction with the Longest Common Prefix (LCP) array, which stores the length of the longest common prefix between consecutive suffixes in the sorted suffix array.

A Suffix Tree for a string S is a rooted tree where each edge represents a non-empty substring of S, and each path from the root to a leaf node represents a unique suffix of S. Internal nodes (except possibly the root) have at least two children.

Use Cases:

Both Suffix Arrays and Suffix Trees have numerous applications in various fields, including:

Bioinformatics: Sequence alignment, finding repeated patterns in DNA or protein sequences.
Text Processing: Full-text search, indexing, text compression.
Data Compression: Algorithms like LZ77 use concepts related to finding repeated substrings.
Information Retrieval: Efficiently searching large text corpora.
Computational Linguistics: Analyzing text structure and patterns.
Finding the Longest Common Substring of multiple strings.
Finding the Longest Palindromic Substring.
String Alignment.

I propose implementing the following:

Suffix Array:

A class SuffixArray that takes a string as input during initialization.
A method to build the suffix array.
A method to optionally compute the LCP array.
Methods for searching for a pattern within the text using the suffix array.

Suffix Tree:

A class SuffixTree that takes a string as input during initialization.
A method to build the suffix tree.

I am interested in contributing to the implementation of these data structures. I would appreciate feedback from the project maintainers on this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant