Skip to content

Commit 09de843

Browse files
authored
Merge pull request #32 from Yukti-1/Adding-Topic-String-Processing
Adding topics String hashing and Rabin-Karp in String processing
2 parents 11803ae + 7cacb3b commit 09de843

File tree

4 files changed

+101
-0
lines changed

4 files changed

+101
-0
lines changed

src/SUMMARY.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,9 @@
2626
- [Graph](./Graph/Graph.md)
2727
- [Tree](./Graph/Tree/Tree.md)
2828
- [Diameter](./Graph/Tree/Diameter/diameter.md)
29+
- [String Processing](./String_Processing/String_Processing.md)
30+
- [String Hashing](./String_Processing/String_Hashing/String_Hashing.md)
31+
- [Rabin-Karp Algorithm](./String_Processing/Rabin-Karp_Algorithm/Rabin-Karp.md)
32+
33+
34+
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Rabin-Karp Algorithm
2+
3+
This is one of the applications of *String hashing*.
4+
5+
Given two strings - a pattern *s* and a text *t*, determine if the pattern appears in the text and if it does, enumerate all its occurrences in O(|s|+|t|) time.
6+
7+
***Algorithm*** : First the hash for the pattern *s* is calculated and then hash of all the substrings of text *t* of the same length as |s| is calculated. Now comparison between pattern and substring can be done in constant time.
8+
9+
## Implementation
10+
```cpp
11+
vector<int> rabin_karp(string const& s, string const& t)
12+
{
13+
const int p = 31;
14+
const int m = 1e9 + 9;
15+
int S = s.size(), T = t.size();
16+
vector<long long> p_pow(max(S, T));
17+
p_pow[0] = 1;
18+
for (int i = 1; i < (int)p_pow.size(); i++)
19+
p_pow[i] = (p_pow[i-1] * p) % m;
20+
vector<long long> h(T + 1, 0);
21+
for (int i = 0; i < T; i++)
22+
h[i+1] = (h[i] + (t[i] - 'a' + 1) * p_pow[i]) % m;
23+
long long h_s = 0;
24+
for (int i = 0;i < S; i++)
25+
h_s = (h_s + (s[i] - 'a' + 1) * p_pow[i]) % m;
26+
vector<int> occurences;
27+
for (int i = 0; i + S - 1 < T; i++)
28+
{
29+
long long cur_h = (h[i+S] + m - h[i]) % m;
30+
if (cur_h == h_s * p_pow[i] % m)
31+
occurences.push_back(i);
32+
}
33+
return occurences;
34+
}
35+
```
36+
## Problems for Practice
37+
38+
- [Good_Substrings](https://codeforces.com/problemset/problem/271/D)
39+
- [Pattern_Find](https://www.spoj.com/problems/NAJPF/)
40+
41+
## References
42+
43+
- [CP-Algorithms](https://cp-algorithms.com/)
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# String Hashing
2+
3+
We need this to compare the strings. Idea is to convert each string to integer and compare those instead of the actual strings which is O(1) operation. The conversion is done by a ***Hash-Function*** and the integer obtained corresponding to the string is called *hash* of the string.
4+
A widely used function is *polynomial rolling hash function* :
5+
6+
![](https://hapq.me/content/images/2019/11/Screen-Shot-2019-11-06-at-4.59.06-PM.png)
7+
8+
where *p* and *m* are some chosen, positive numbers. *p* is a prime approximately equal to the number of characters in the input alphabet and *m* is a large number.
9+
Here, it is m=10^9 + 9.
10+
11+
*The number of possible characters is higher and pattern length can be large. So the numeric values cannot be practically stored as an integer. Therefore, the numeric value is calculated using modular arithmetic to make sure that the hash values can be stored in an integer variable.*
12+
13+
## Implementation
14+
```cpp
15+
long long compute_hash(string const& s)
16+
{
17+
const int p = 31;
18+
const int m = 1e9 + 9;
19+
long long hash_value = 0;
20+
long long p_pow = 1;
21+
for (char c : s)
22+
{
23+
hash_value = (hash_value + (c - 'a' + 1) * p_pow)%m;
24+
p_pow = (p_pow * p) % m;
25+
}
26+
return hash_value;
27+
}
28+
```
29+
Two strings with equal hashes need not be equal. There are possibilities of collision which can be resolved by simply calculating hashes using two different values of *p* and *m* which reduces the probability of collision.
30+
31+
## Examples Of Uses
32+
33+
- Find all the duplicate strings from a given list of strings
34+
- Find the number of different substrings in a string
35+
36+
## Practice Problems
37+
38+
- [A Needle in the Haystack - SPOJ](https://www.spoj.com/problems/NHAY/)
39+
- [Password - Codeforces](https://codeforces.com/problemset/problem/126/B)
40+
41+
### References
42+
43+
- [CP-Algorithms](https://cp-algorithms.com/)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# String Processing
2+
3+
A string is nothing but a sequence of symbols or characters. Sometimes, we come across problems where a string is given and the task is to search for a given pattern in that string. The straightforward method is to check by traversing the string index by index and searching for the pattern. But the process becomes slow when the length of the
4+
string increases.
5+
So, in this case hashing algorithms prove to be very useful.
6+
The topics covered are :
7+
8+
- [String Hashing](./String_Hashing/String_Hashing.md)
9+
- [Rabin Karp Algorithm](./Rabin-Karp_Algorithm/Rabin-Karp.md)

0 commit comments

Comments
 (0)