You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5Lines changed: 5 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -2,10 +2,15 @@
2
2
3
3
## Next
4
4
5
+
### Added
6
+
7
+
- Added support for automatic schema extraction from text using LLMs. In the `SimpleKGPipeline`, when the user provides no schema, the automatic schema extraction is enabled by default.
8
+
5
9
### Fixed
6
10
7
11
- Fixed a bug where `spacy` and `rapidfuzz` needed to be installed even if not using the relevant entity resolvers.
@@ -21,7 +21,7 @@ A Knowledge Graph (KG) construction pipeline requires a few components (some of
21
21
- **Data loader**: extract text from files (PDFs, ...).
22
22
- **Text splitter**: split the text into smaller pieces of text (chunks), manageable by the LLM context window (token limit).
23
23
- **Chunk embedder** (optional): compute the chunk embeddings.
24
-
- **Schema builder**: provide a schema to ground the LLM extracted entities and relations and obtain an easily navigable KG.
24
+
- **Schema builder**: provide a schema to ground the LLM extracted entities and relations and obtain an easily navigable KG. Schema can be provided manually or extracted automatically using LLMs.
25
25
- **Lexical graph builder**: build the lexical graph (Document, Chunk and their relationships) (optional).
26
26
- **Entity and relation extractor**: extract relevant entities and relations from the text.
27
27
- **Knowledge Graph writer**: save the identified entities and relations.
@@ -75,10 +75,11 @@ Graph Schema
75
75
76
76
It is possible to guide the LLM by supplying a list of entities, relationships,
77
77
and instructions on how to connect them. However, note that the extracted graph
78
-
may not fully adhere to these guidelines. Entities and relationships can be
79
-
represented as either simple strings (for their labels) or dictionaries. If using
80
-
a dictionary, it must include a label key and can optionally include description
81
-
and properties keys, as shown below:
78
+
may not fully adhere to these guidelines unless schema enforcement is enabled
79
+
(see :ref:`Schema Enforcement Behaviour`). Entities and relationships can be represented
80
+
as either simple strings (for their labels) or dictionaries. If using a dictionary,
81
+
it must include a label key and can optionally include description and properties keys,
82
+
as shown below:
82
83
83
84
.. code:: python
84
85
@@ -117,14 +118,20 @@ This schema information can be provided to the `SimpleKGBuilder` as demonstrated
117
118
118
119
.. code:: python
119
120
121
+
# Using the schema parameter (recommended approach)
120
122
kg_builder = SimpleKGPipeline(
121
123
# ...
122
-
entities=ENTITIES,
123
-
relations=RELATIONS,
124
-
potential_schema=POTENTIAL_SCHEMA,
124
+
schema={
125
+
"entities": ENTITIES,
126
+
"relations": RELATIONS,
127
+
"potential_schema": POTENTIAL_SCHEMA
128
+
},
125
129
# ...
126
130
)
127
131
132
+
.. note::
133
+
By default, if no schema is provided to the SimpleKGPipeline, automatic schema extraction will be performed using the LLM (See the :ref:`Automatic Schema Extraction with SchemaFromTextExtractor`).
134
+
128
135
Extra configurations
129
136
--------------------
130
137
@@ -412,41 +419,44 @@ within the configuration file.
412
419
"neo4j_database": "myDb",
413
420
"on_error": "IGNORE",
414
421
"prompt_template": "...",
415
-
"entities": [
416
-
"Person",
417
-
{
418
-
"label": "House",
419
-
"description": "Family the person belongs to",
420
-
"properties": [
421
-
{"name": "name", "type": "STRING"}
422
-
]
423
-
},
424
-
{
425
-
"label": "Planet",
426
-
"properties": [
427
-
{"name": "name", "type": "STRING"},
428
-
{"name": "weather", "type": "STRING"}
429
-
]
430
-
}
431
-
],
432
-
"relations": [
433
-
"PARENT_OF",
434
-
{
435
-
"label": "HEIR_OF",
436
-
"description": "Used for inheritor relationship between father and sons"
437
-
},
438
-
{
439
-
"label": "RULES",
440
-
"properties": [
441
-
{"name": "fromYear", "type": "INTEGER"}
442
-
]
443
-
}
444
-
],
445
-
"potential_schema": [
446
-
["Person", "PARENT_OF", "Person"],
447
-
["Person", "HEIR_OF", "House"],
448
-
["House", "RULES", "Planet"]
449
-
],
422
+
423
+
"schema": {
424
+
"entities": [
425
+
"Person",
426
+
{
427
+
"label": "House",
428
+
"description": "Family the person belongs to",
429
+
"properties": [
430
+
{"name": "name", "type": "STRING"}
431
+
]
432
+
},
433
+
{
434
+
"label": "Planet",
435
+
"properties": [
436
+
{"name": "name", "type": "STRING"},
437
+
{"name": "weather", "type": "STRING"}
438
+
]
439
+
}
440
+
],
441
+
"relations": [
442
+
"PARENT_OF",
443
+
{
444
+
"label": "HEIR_OF",
445
+
"description": "Used for inheritor relationship between father and sons"
446
+
},
447
+
{
448
+
"label": "RULES",
449
+
"properties": [
450
+
{"name": "fromYear", "type": "INTEGER"}
451
+
]
452
+
}
453
+
],
454
+
"potential_schema": [
455
+
["Person", "PARENT_OF", "Person"],
456
+
["Person", "HEIR_OF", "House"],
457
+
["House", "RULES", "Planet"]
458
+
]
459
+
},
450
460
"lexical_graph_config": {
451
461
"chunk_node_label": "TextPart"
452
462
}
@@ -462,31 +472,32 @@ or in YAML:
462
472
neo4j_database: myDb
463
473
on_error: IGNORE
464
474
prompt_template: ...
465
-
entities:
466
-
- label: Person
467
-
- label: House
468
-
description: Family the person belongs to
469
-
properties:
470
-
- name: name
471
-
type: STRING
472
-
- label: Planet
473
-
properties:
474
-
- name: name
475
-
type: STRING
476
-
- name: weather
477
-
type: STRING
478
-
relations:
479
-
- label: PARENT_OF
480
-
- label: HEIR_OF
481
-
description: Used for inheritor relationship between father and sons
482
-
- label: RULES
483
-
properties:
484
-
- name: fromYear
485
-
type: INTEGER
486
-
potential_schema:
487
-
- ["Person", "PARENT_OF", "Person"]
488
-
- ["Person", "HEIR_OF", "House"]
489
-
- ["House", "RULES", "Planet"]
475
+
schema:
476
+
entities:
477
+
- Person
478
+
- label: House
479
+
description: Family the person belongs to
480
+
properties:
481
+
- name: name
482
+
type: STRING
483
+
- label: Planet
484
+
properties:
485
+
- name: name
486
+
type: STRING
487
+
- name: weather
488
+
type: STRING
489
+
relations:
490
+
- PARENT_OF
491
+
- label: HEIR_OF
492
+
description: Used for inheritor relationship between father and sons
493
+
- label: RULES
494
+
properties:
495
+
- name: fromYear
496
+
type: INTEGER
497
+
potential_schema:
498
+
- ["Person", "PARENT_OF", "Person"]
499
+
- ["Person", "HEIR_OF", "House"]
500
+
- ["House", "RULES", "Planet"]
490
501
lexical_graph_config:
491
502
chunk_node_label: TextPart
492
503
@@ -791,6 +802,44 @@ Here is a code block illustrating these concepts:
791
802
After validation, this schema is saved in a `SchemaConfig` object, whose dict representation is passed
792
803
to the LLM.
793
804
805
+
Automatic Schema Extraction
806
+
---------------------------
807
+
808
+
Instead of manually defining the schema, you can use the `SchemaFromTextExtractor` component to automatically extract a schema from your text using an LLM:
809
+
810
+
.. code:: python
811
+
812
+
from neo4j_graphrag.experimental.components.schema import SchemaFromTextExtractor
813
+
from neo4j_graphrag.llm import OpenAILLM
814
+
815
+
# Instantiate the automatic schema extractor component
The `SchemaFromTextExtractor` component analyzes the text and identifies entity types, relationship types, and their property types. It creates a complete `SchemaConfig` object that can be used in the same way as a manually defined schema.
830
+
831
+
You can also save and reload the extracted schema:
832
+
833
+
.. code:: python
834
+
835
+
# Save the schema to JSON or YAML files
836
+
schema_config.store_as_json("my_schema.json")
837
+
schema_config.store_as_yaml("my_schema.yaml")
838
+
839
+
# Later, reload the schema from file
840
+
from neo4j_graphrag.experimental.components.schema import SchemaConfig
841
+
restored_schema = SchemaConfig.from_file("my_schema.json") # or my_schema.yaml
842
+
794
843
795
844
Entity and Relation Extractor
796
845
=============================
@@ -832,6 +881,8 @@ The LLM to use can be customized, the only constraint is that it obeys the :ref:
832
881
833
882
Schema Enforcement Behaviour
834
883
----------------------------
884
+
.. _schema-enforcement-behaviour:
885
+
835
886
By default, even if a schema is provided to guide the LLM in the entity and relation extraction, the LLM response is not validated against that schema.
836
887
This behaviour can be changed by using the `enforce_schema` flag in the `LLMEntityRelationExtractor` constructor:
0 commit comments