You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+9Lines changed: 9 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -12,8 +12,17 @@
12
12
13
13
### Changed
14
14
15
+
#### Strict mode
16
+
15
17
- Strict mode in `SimpleKGPipeline`: now properties and relationships are pruned only if they are defined in the input schema.
16
18
19
+
#### Schema definition
20
+
21
+
- The `SchemaEntity` model has been renamed `NodeType`.
22
+
- The `SchemaRelation` model has been renamed `RelationshipType`.
23
+
- The `SchemaProperty` model has been renamed `PropertyType`.
24
+
-`SchemaConfig` has been removed in favor of `GraphSchema` (used in the `SchemaBuilder` and `EntityRelationExtractor` classes). `entities`, `relations` and `potential_schema` fields have also been renamed `node_types`, `relationship_types` and `patterns` respectively.
Copy file name to clipboardExpand all lines: docs/source/user_guide_kg_builder.rst
+42-44Lines changed: 42 additions & 44 deletions
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ A Knowledge Graph (KG) construction pipeline requires a few components (some of
21
21
- **Data loader**: extract text from files (PDFs, ...).
22
22
- **Text splitter**: split the text into smaller pieces of text (chunks), manageable by the LLM context window (token limit).
23
23
- **Chunk embedder** (optional): compute the chunk embeddings.
24
-
- **Schema builder**: provide a schema to ground the LLM extracted entities and relations and obtain an easily navigable KG. Schema can be provided manually or extracted automatically using LLMs.
24
+
- **Schema builder**: provide a schema to ground the LLM extracted node and relationship types and obtain an easily navigable KG. Schema can be provided manually or extracted automatically using LLMs.
25
25
- **Lexical graph builder**: build the lexical graph (Document, Chunk and their relationships) (optional).
26
26
- **Entity and relation extractor**: extract relevant entities and relations from the text.
27
27
- **Knowledge Graph writer**: save the identified entities and relations.
@@ -73,18 +73,18 @@ Customizing the SimpleKGPipeline
73
73
Graph Schema
74
74
------------
75
75
76
-
It is possible to guide the LLM by supplying a list of entities, relationships,
77
-
and instructions on how to connect them. However, note that the extracted graph
78
-
may not fully adhere to these guidelines unless schema enforcement is enabled
79
-
(see :ref:`Schema Enforcement Behaviour`). Entities and relationships can be represented
76
+
It is possible to guide the LLM by supplying a list of node and relationship types,
77
+
and instructions on how to connect them (patterns). However, note that the extracted graph
78
+
may not fully adhere to these guidelines unless schema enforcement is enabled
79
+
(see :ref:`Schema Enforcement Behaviour`). Node and relationship types can be represented
80
80
as either simple strings (for their labels) or dictionaries. If using a dictionary,
81
81
it must include a label key and can optionally include description and properties keys,
The `potential_schema` is defined by a list of triplet in the format:
105
+
The `patterns` are defined by a list of triplet in the format:
106
106
`(source_node_label, relationship_label, target_node_label)`. For instance:
107
107
108
108
109
109
.. code:: python
110
110
111
-
POTENTIAL_SCHEMA= [
111
+
PATTERNS= [
112
112
("Person", "PARENT_OF", "Person"),
113
113
("Person", "HEIR_OF", "House"),
114
114
("House", "RULES", "Planet"),
@@ -122,15 +122,15 @@ This schema information can be provided to the `SimpleKGBuilder` as demonstrated
122
122
kg_builder = SimpleKGPipeline(
123
123
# ...
124
124
schema={
125
-
"entities": ENTITIES,
126
-
"relations": RELATIONS,
127
-
"potential_schema": POTENTIAL_SCHEMA
125
+
"node_types": NODE_TYPES,
126
+
"relationship_types": RELATIONSHIP_TYPES,
127
+
"patterns": PATTERNS
128
128
},
129
129
# ...
130
130
)
131
131
132
132
.. note::
133
-
By default, if no schema is provided to the SimpleKGPipeline, automatic schema extraction will be performed using the LLM (See the :ref:`Automatic Schema Extraction with SchemaFromTextExtractor`).
133
+
By default, if no schema is provided to the SimpleKGPipeline, automatic schema extraction will be performed using the LLM (See the :ref:`Automatic Schema Extraction`).
134
134
135
135
Extra configurations
136
136
--------------------
@@ -419,9 +419,8 @@ within the configuration file.
419
419
"neo4j_database": "myDb",
420
420
"on_error": "IGNORE",
421
421
"prompt_template": "...",
422
-
423
422
"schema": {
424
-
"entities": [
423
+
"node_types": [
425
424
"Person",
426
425
{
427
426
"label": "House",
@@ -438,7 +437,7 @@ within the configuration file.
438
437
]
439
438
}
440
439
],
441
-
"relations": [
440
+
"relationship_types": [
442
441
"PARENT_OF",
443
442
{
444
443
"label": "HEIR_OF",
@@ -451,7 +450,7 @@ within the configuration file.
451
450
]
452
451
}
453
452
],
454
-
"potential_schema": [
453
+
"patterns": [
455
454
["Person", "PARENT_OF", "Person"],
456
455
["Person", "HEIR_OF", "House"],
457
456
["House", "RULES", "Planet"]
@@ -473,7 +472,7 @@ or in YAML:
473
472
on_error: IGNORE
474
473
prompt_template: ...
475
474
schema:
476
-
entities:
475
+
node_types:
477
476
- Person
478
477
- label: House
479
478
description: Family the person belongs to
@@ -486,15 +485,15 @@ or in YAML:
486
485
type: STRING
487
486
- name: weather
488
487
type: STRING
489
-
relations:
488
+
relationship_types:
490
489
- PARENT_OF
491
490
- label: HEIR_OF
492
491
description: Used for inheritor relationship between father and sons
493
492
- label: RULES
494
493
properties:
495
494
- name: fromYear
496
495
type: INTEGER
497
-
potential_schema:
496
+
patterns:
498
497
- ["Person", "PARENT_OF", "Person"]
499
498
- ["Person", "HEIR_OF", "House"]
500
499
- ["House", "RULES", "Planet"]
@@ -747,62 +746,62 @@ Optionally, the document and chunk node labels can be configured using a `Lexica
747
746
Schema Builder
748
747
==============
749
748
750
-
The schema is used to try and ground the LLM to a list of possible entities and relations of interest.
749
+
The schema is used to try and ground the LLM to a list of possible node and relationship types of interest.
751
750
So far, schema must be manually created by specifying:
752
751
753
-
- **Entities** the LLM should look for in the text, including their properties (name and type).
754
-
- **Relations** of interest between these entities, including the relation properties (name and type).
755
-
- **Triplets** to define the start (source) and end (target) entity types for each relation.
752
+
- **Node types** the LLM should look for in the text, including their properties (name and type).
753
+
- **Relationship types** of interest between these node types, including the relationship properties (name and type).
754
+
- **Patterns** (triplets) to define the start (source) and end (target) entity types for each relationship.
756
755
757
756
Here is a code block illustrating these concepts:
758
757
759
758
.. code:: python
760
759
761
760
from neo4j_graphrag.experimental.components.schema import (
After validation, this schema is saved in a `SchemaConfig` object, whose dict representation is passed
801
+
After validation, this schema is saved in a `GraphSchema` object, whose dict representation is passed
803
802
to the LLM.
804
803
805
-
Automatic Schema Extraction
804
+
Automatic Schema Extraction
806
805
---------------------------
807
806
808
807
Instead of manually defining the schema, you can use the `SchemaFromTextExtractor` component to automatically extract a schema from your text using an LLM:
@@ -826,19 +825,19 @@ Instead of manually defining the schema, you can use the `SchemaFromTextExtracto
The `SchemaFromTextExtractor` component analyzes the text and identifies entity types, relationship types, and their property types. It creates a complete `SchemaConfig` object that can be used in the same way as a manually defined schema.
828
+
The `SchemaFromTextExtractor` component analyzes the text and identifies entity types, relationship types, and their property types. It creates a complete `GraphSchema` object that can be used in the same way as a manually defined schema.
830
829
831
830
You can also save and reload the extracted schema:
832
831
833
832
.. code:: python
834
833
835
834
# Save the schema to JSON or YAML files
836
-
schema_config.store_as_json("my_schema.json")
837
-
schema_config.store_as_yaml("my_schema.yaml")
838
-
835
+
extracted_schema.store_as_json("my_schema.json")
836
+
extracted_schema.store_as_yaml("my_schema.yaml")
837
+
839
838
# Later, reload the schema from file
840
-
from neo4j_graphrag.experimental.components.schema importSchemaConfig
841
-
restored_schema =SchemaConfig.from_file("my_schema.json") # or my_schema.yaml
839
+
from neo4j_graphrag.experimental.components.schema importGraphSchema
840
+
restored_schema =GraphSchema.from_file("my_schema.json") # or my_schema.yaml
842
841
843
842
844
843
Entity and Relation Extractor
@@ -993,7 +992,6 @@ If more customization is needed, it is possible to subclass the `EntityRelationE
993
992
994
993
from pydantic import validate_call
995
994
from neo4j_graphrag.experimental.components.entity_relation_extractor import EntityRelationExtractor
996
-
from neo4j_graphrag.experimental.components.schema import SchemaConfig
997
995
from neo4j_graphrag.experimental.components.types import (
0 commit comments