diff --git a/docs/getting-started/5-output-rails/README.md b/docs/getting-started/5-output-rails/README.md index c8f0be042..43965c61e 100644 --- a/docs/getting-started/5-output-rails/README.md +++ b/docs/getting-started/5-output-rails/README.md @@ -183,12 +183,15 @@ You can enable streaming to provide asynchronous responses and reduce the time t flows: - self check output streaming: + enabled: True chunk_size: 200 context_size: 50 streaming: True ``` + The `enabled: True` field is required to enable streaming output rails while the `streaming: True` field is needed to enable streaming generation. + 1. Call the `stream_async` method and handle the chunked response: ```python diff --git a/docs/user-guides/configuration-guide.md b/docs/user-guides/configuration-guide.md index 2f5b1d532..073cc7b93 100644 --- a/docs/user-guides/configuration-guide.md +++ b/docs/user-guides/configuration-guide.md @@ -103,6 +103,7 @@ nemoguardrails find-providers [--list] ``` The command supports two modes: + - Interactive mode (default): Guides you through selecting a provider type (text completion or chat completion) and then shows available providers for that type - List mode (`--list`): Simply lists all available providers without interactive selection @@ -679,13 +680,14 @@ You can enable streaming to begin receiving responses from the output rail soone You must set the top-level `streaming: True` field in your `config.yml` file. -For each output rail, add the `streaming` field and configuration parameters. +For the output rails, add the `streaming` field and configuration parameters. ```yaml rails: output: - rail name streaming: + enabled: True chunk_size: 200 context_size: 50 stream_first: True @@ -735,6 +737,11 @@ The following table describes the subfields for the `streaming` field: Specifying approximately 25% of `chunk_size` provides a good compromise. - `50` +* - streaming.enabled + - When set to `True`, the toolkit executes output rails in streaming mode. + + - `False` + * - streaming.stream_first - When set to `False`, the toolkit applies the output rails to the chunks before streaming them to the client. If you set this field to `False`, you can avoid streaming chunks of blocked content.