Skip to content

Commit 8e54b8b

Browse files
committed
Merge pull request #2 from duedil-ltd/feature/docs
Documentation
2 parents c577614 + da7a00b commit 8e54b8b

File tree

3 files changed

+45
-2
lines changed

3 files changed

+45
-2
lines changed

README.md

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,46 @@ python-lzo-indexer
33

44
![](https://travis-ci.org/duedil-ltd/python-lzo-indexer.png)
55

6-
Python library for indexing block offsets within LZO compressed files.
6+
Python library for indexing block offsets within LZO compressed files. The implementation is largely based on that of the [Hadoop Library](https://github.com/twitter/hadoop-lzo). Index files are used to allow Hadoop to split a single file compressed with LZO into several chunks for parallel processing.
7+
8+
Since LZO is a block based compression algorithm, we can split the file along the lines of blocks and decompress each block on it's own. The index is a file containing byte offsets for each block in the original LZO file.
9+
10+
11+
Example
12+
-------
13+
14+
The python code below demonstrates how easy it is to index an LZO file. This library also supports indexing a string, and a method to return the individual block offsets should you need to create a file of your own format.
15+
16+
```python
17+
import lzo_indexer
18+
19+
with open("my-file.lzo", "r") as f:
20+
with open("my-file.lzo.index", "rw") as index:
21+
lzo_indexer.index_lzo_file(f, index)
22+
```
23+
24+
25+
Command-line Utility
26+
--------------------
27+
28+
This library also includes a utility for indexing multiple lzo files, using the python indexer. This is a much faster alternative to the command line utility built into the hadoop-lzo library as it avoids the JVM.
29+
30+
```
31+
$ bin/lzo-indexer --help
32+
33+
usage: lzo-indexer [-h] [--verbose] [--force] lzo_files [lzo_files ...]
34+
35+
positional arguments:
36+
lzo_files List of LZO files to index
37+
38+
optional arguments:
39+
-h, --help show this help message and exit
40+
--verbose, -v Enable verbose logging
41+
--force, -f Force re-creation of an index even if it exists
42+
```
43+
44+
45+
Contributions
46+
-------------
47+
48+
I welcome any contributions, though I request that any pull requests come with test coverage.

bin/lzo-indexer

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ def parse_args(argv):
1717
parser.add_argument("--verbose", "-v", default=False, action="store_true",
1818
help="Enable verbose logging")
1919
parser.add_argument("--force", "-f", default=False, action="store_true",
20-
help="Force re-creation of an index even if it exsts")
20+
help="Force re-creation of an index even if it exists")
2121
parser.add_argument("lzo_files", type=str, nargs="+",
2222
help="List of LZO files to index")
2323

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,6 @@ def read(filename):
1818
download_url="https://github.com/duedil-ltd/python-lzo-indexer/archive/release-0.0.1.zip",
1919
license=read("LICENSE"),
2020
packages=["lzo_indexer"],
21+
scripts=["bin/lzo-indexer"],
2122
test_suite="tests.test_indexer",
2223
)

0 commit comments

Comments
 (0)