Skip to content

Commit f906891

Browse files
committed
Add gradio demos
NOTE: gradio is networking, so add disclaimer for users that google CDNs will be used
1 parent 019fcc3 commit f906891

14 files changed

+1198
-3
lines changed

.dockerignore

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# test it according to https://stackoverflow.com/a/71751097/9360161
2+
# rsync -avnh . /dev/shm --exclude-from .dockerignore
3+
4+
# https://stackoverflow.com/a/68196656/9360161
5+
# docker build --no-cache --progress plain --file - . <<EOF
6+
# FROM busybox
7+
# COPY . /build-context
8+
# WORKDIR /build-context
9+
# RUN find .
10+
# EOF
11+
12+
# docker build -t docker-show-context https://github.com/pwaller/docker-show-context.git
13+
# docker run --rm -v $PWD:/data docker-show-context
14+
15+
**.py[ocd]
16+
**/__pycache__/
17+
**/*.egg-info/
18+
build/
19+
dist/
20+
21+
htmlcov/
22+
.coverage
23+
coverage.xml
24+
25+
.tox/
26+
.pytest_cache/
27+
.mypy_cache/
28+
.ruff_cache/
29+
30+
.vscode/
31+
.git/
32+
33+
data/
34+
35+
venv/

README.md

+7-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
The LCC-NLP tools for now only include a **sentence segmentizer**, **sentence cleaner**, **sentence language classifier** and **word tokenizer**. Additionally, it provides some methods to work with SOURCE and MEDUSA file formats that are in use at LCC.
44
This library can be used embedded, with [spaCy](https://spacy.io/) or as CLI tool.
55

6-
[Installation](#installation) | [Configuration](#configuration-and-resources) | [CLI Usage](#run-cli) | [spaCy Integration](#integration-with-spacy) | [Development](#development)
6+
[Installation](#installation) | [Configuration](#configuration-and-resources) | [CLI Usage](#run-cli) | [spaCy Integration](#integration-with-spacy) | [Demos](#demos) | [Development](#development)
77

88
Licensed under [_GNU General Public License v3 or later (GPLv3+)_](LICENSE).
99

@@ -262,6 +262,12 @@ False
262262
('deu', 1.0)
263263
```
264264
265+
## Demos
266+
267+
### Gradio
268+
269+
Gradio demo applications can be found in [`examples/gradio/`](examples/gradio/). There is also a [`Dockerfile`](examples/gradio/Dockerfile) to allow easy deployment.
270+
265271
## Development
266272
267273
Optional dependencies:

examples/README.md

-1
This file was deleted.

examples/gradio/Dockerfile

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# FROM python:3.12.1-slim-bookworm
2+
FROM python:3.12.1-alpine3.19
3+
4+
ENV PIP_ROOT_USER_ACTION=ignore
5+
# home for "nobody" user (e.g., caches, configs)
6+
ENV HOME=/workspace
7+
# gradio stuff
8+
ARG GRADIO_SERVER_PORT=7860
9+
ENV GRADIO_SERVER_PORT=${GRADIO_SERVER_PORT}
10+
ENV GRADIO_NUM_PORTS=1
11+
ENV GRADIO_SERVER_NAME="0.0.0.0"
12+
ENV GRADIO_ANALYTICS_ENABLED=False
13+
# whether file paths to tool config files can be set
14+
ENV READONLY_PATHS=True
15+
16+
EXPOSE ${GRADIO_SERVER_PORT}
17+
18+
WORKDIR /workspace
19+
20+
# standard pip update
21+
RUN python3 -m pip install -U pip setuptools wheel
22+
23+
# resources
24+
COPY resources /workspace/resources
25+
26+
# install LCC-NLP tools
27+
COPY setup.cfg setup.py pyproject.toml MANIFEST.in LICENSE /workspace/
28+
COPY src /workspace/src
29+
RUN python3 -m pip install .
30+
31+
# add gradio stuff
32+
COPY examples/gradio /workspace/gradio
33+
RUN python3 -m pip install -r /workspace/gradio/requirements.txt
34+
35+
# downgrade user
36+
RUN chmod -R o+rw /workspace
37+
USER nobody
38+
39+
CMD ["python3", "/workspace/gradio/lcc_demo.py"]

examples/gradio/README.md

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# Gradio Demos
2+
3+
Note that all the Gradio demos do some basic parameter sanitization. That means all paths will be resolved and blocked if they point to parent directories (of the current working directory). That mean the demos need their resources not symlinked but copied to work, or change to the base of this repo to execute the python scripts from there!
4+
5+
## Single tool demo
6+
7+
```bash
8+
python examples/gradio/segmentizer.py
9+
```
10+
11+
```bash
12+
python examples/gradio/tokenizer.py
13+
```
14+
15+
```bash
16+
python examples/gradio/lani.py
17+
```
18+
19+
## All LCC tools demo
20+
21+
```bash
22+
python examples/gradio/lcc_demo.py
23+
```
24+
25+
## Development (Hot-Reloading)
26+
27+
Specify the demo block name and the file.
28+
29+
```bash
30+
gradio --demo-name lcc_demo examples/gradio/lcc_demo.py
31+
```
32+
33+
## Docker Deployment
34+
35+
Run the following commands from the root of this repo!
36+
37+
```bash
38+
docker build -f examples/gradio/Dockerfile -t lcc-gradio-demo .
39+
```
40+
41+
```bash
42+
docker run --rm -it -p "8080:7860" --name lcc-gradio-demo lcc-gradio-demo
43+
```
44+
45+
You can map custom resources into the container at `/workspace/resources/`.
46+
47+
Visit http://localhost:8080 to access the Gradio demo.
48+
49+
_Dev hint: use https://github.com/pwaller/docker-show-context to check your `.dockerignore` file._

examples/gradio/cleaner.py

+141
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
import os.path
2+
from typing import Optional
3+
4+
import gradio as gr
5+
6+
import lcc.cleaner
7+
8+
# ---------------------------------------------------------------------------
9+
10+
11+
READONLY_PATHS = os.getenv("READONLY_PATHS", "True").lower() == "true"
12+
13+
14+
# ---------------------------------------------------------------------------
15+
16+
17+
with gr.Blocks() as cleaner:
18+
dn_resources = "resources/cleaner"
19+
20+
# configuration
21+
with gr.Accordion("⚙️ Sentence Cleaner Configuration", open=False):
22+
dn_rules = gr.Textbox(
23+
value=dn_resources,
24+
label="Folder with cleaner rule files",
25+
max_lines=1,
26+
interactive=not READONLY_PATHS,
27+
visible=not READONLY_PATHS,
28+
)
29+
fn_replacements = gr.Textbox(
30+
value="StringReplacements.list",
31+
label="String replacement mapping file (filename in rules folder)",
32+
max_lines=1,
33+
interactive=not READONLY_PATHS,
34+
visible=not READONLY_PATHS,
35+
)
36+
37+
with gr.Row():
38+
text_type = gr.Textbox(
39+
label="Text type, e.g., 'web', 'news'/'newscrawl', 'wikipedia', ...",
40+
max_lines=1,
41+
interactive=True,
42+
)
43+
lang_code = gr.Textbox(
44+
label="Language of text, e.g., 'deu', 'eng', ...",
45+
max_lines=1,
46+
interactive=True,
47+
)
48+
49+
do_replacements = gr.Checkbox(
50+
value=True,
51+
label="Whether to apply text replacements before checking sentence quality",
52+
interactive=True,
53+
)
54+
55+
# inputs
56+
document_text = gr.Textbox(
57+
lines=3, label="Text", placeholder="Enter a single sentence"
58+
)
59+
60+
# action buttons
61+
cleaner_btn = gr.Button("Sanitize Sentences", variant="primary")
62+
63+
# outputs
64+
with gr.Group():
65+
status_text = gr.HTML(label="Sentence Quality Status")
66+
replaced_text = gr.Textbox(
67+
label="Text with replacements", show_copy_button=True
68+
)
69+
filter_details_json = gr.JSON(label="Filter rules that flagged this sentence")
70+
71+
# worker function
72+
def clean_text(
73+
text: str,
74+
dn_rules: Optional[str],
75+
text_type: Optional[str] = None,
76+
lang_code: Optional[str] = None,
77+
fn_replacements: Optional[str] = None,
78+
do_replacements: Optional[bool] = True,
79+
):
80+
cwd = os.getcwd()
81+
if os.path.relpath(os.path.realpath(dn_rules), cwd).startswith("../"):
82+
raise gr.Error("Invalid 'dn_rules' path!")
83+
if fn_replacements and os.path.relpath(
84+
os.path.realpath(os.path.join(dn_rules, fn_replacements)), cwd
85+
).startswith("../"):
86+
raise gr.Error("Invalid 'fn_replacements' filename!")
87+
88+
if not os.path.isdir(dn_rules):
89+
gr.Warning("Folder 'dn_rules' not found!")
90+
if not os.path.isfile(os.path.join(dn_rules, fn_replacements)):
91+
gr.Warning("File 'fn_replacements' not found!")
92+
93+
lcc_cleaner = lcc.cleaner.SentenceCleaner(
94+
dn_rules=dn_rules,
95+
text_type=text_type,
96+
lang_code=lang_code,
97+
fn_replacements=fn_replacements,
98+
)
99+
100+
replaced = lcc_cleaner.replacer.replace(text) if do_replacements else None
101+
results = lcc_cleaner.filter_sentence_results(
102+
text, do_replacements=do_replacements
103+
)
104+
filter_details = [
105+
{"rule": filter.id_, "description": filter.description, "hit": hit}
106+
for filter, hit in results.items()
107+
if hit
108+
]
109+
110+
is_ok = not filter_details
111+
112+
status = "✅ Sentence is ok." if is_ok else "❎ Sentence is bad!"
113+
new_status_text = gr.HTML(
114+
value=f"<p style='margin: 1rem; text-align: center; font-weight: bold;'>{status}</p>"
115+
)
116+
117+
return {
118+
replaced_text: replaced,
119+
filter_details_json: filter_details if filter_details else None,
120+
status_text: new_status_text,
121+
}
122+
123+
# action buttons event handler
124+
cleaner_btn.click(
125+
fn=clean_text,
126+
inputs=[
127+
document_text,
128+
dn_rules,
129+
text_type,
130+
lang_code,
131+
fn_replacements,
132+
do_replacements,
133+
],
134+
outputs=[replaced_text, filter_details_json, status_text],
135+
)
136+
137+
138+
# ---------------------------------------------------------------------------
139+
140+
if __name__ == "__main__":
141+
cleaner.launch(show_api=False, share=False)

examples/gradio/favicon.ico

2.19 KB
Binary file not shown.

0 commit comments

Comments
 (0)