You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Provides the ability to create a string from a list of tokens that are contained in a span.
904
+
905
+
The specified tokenProperty is used to extract the values from the tokens when creating the string.
906
+
For SCNLP, this tokenProperty could be values like 'orig', 'lemma', or 'pos'. The spans would typically be a SCNLP 'sentence' or could even be things like an OM 'ce:para'.
907
+
908
+
Args:
909
+
tokens: Dataframe of AQAnnotations (which we will use to concatenate for the string)
910
+
spans: Dataframe of AQAnnotations (identifies the start/end for the tokens to be used for the concatenated string)
911
+
tokenProperty: The property field in the tokens to use for extracting the value for the concatenated string
912
+
913
+
Returns:
914
+
Dataframe[AQAnnotation] spans with 3 new properties all prefixed with the specified tokenProperty value followed by (ToksStr, ToksSpos, ToksEpos) The ToksStr property will be the
915
+
concatenated string of token property values contained in the span. The ToksSPos and ToksEpos are properties that will help us determine the start/end offset for each of the individual tokens in the ToksStr.
916
+
These helper properties are needed for the function RegexTokensSpan so we can generate accurate accurate start/end offsets based on the str file.
917
+
"""
918
+
defprocess(rec):
919
+
span=rec[0]
920
+
tokens=rec[1]
921
+
newProps= {}
922
+
oldProps=span.properties
923
+
forkeyinoldProps.keys():
924
+
newProps[key] =oldProps[key]
925
+
toksStr= []
926
+
toksSpos= []
927
+
toksEpos= []
928
+
offset=0
929
+
fortokenintokens:
930
+
tokeStr=""
931
+
if (token.properties!=None) and (tokenPropertyintoken.properties):
Copy file name to clipboardExpand all lines: README.md
+6-1
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ We realize that you can't have a Dataframe[AQAnnotation] like you can with scala
38
38
39
39
#### Utilities
40
40
41
-
The GetAQAnnotation and GetCATAnnotation and utility classes have been developed to create an AQAnnotation from the archive format (CATAnnotation) and vise versa. When creating the AQAnnotation, the ampersand separated string of name-value pairs in the CATAnnotation other field is mapped to a Map in the AQAnnotation record. To minimize memory consumption and increase performance, you can specify which name-value pairs to include in the Map. For more details on the implementation, view the corresponding class for each function in the AQPython Utilities module. For usage examples, view the GetAQAnnotation and GetCATAnnotation classes in the test_utilities module.
41
+
The GetAQAnnotation and GetCATAnnotation and utility classes have been developed to create an AQAnnotation from the archive format (CATAnnotation) and vise versa. When creating the AQAnnotation, the ampersand separated string of name-value pairs in the CATAnnotation other field is mapped to a Map in the AQAnnotation record. To minimize memory consumption and increase performance, you can specify which name-value pairs to include in the Map as well as which ones to decode or lower case. if you want all name-value pairs to be included in the map, simply specify a value of ["*"] for the parameter in the function.. For more details on the implementation, view the corresponding class for each function in the AQPython Utilities module. For usage examples, view the GetAQAnnotation and GetCATAnnotation classes in the test_utilities module.
42
42
43
43
44
44
#### AnnotationQuery Functions
@@ -57,6 +57,8 @@ The following functions are currently provided by AnnotationQuery. Since functio
57
57
58
58
**ContainedIn** - Provide the ability to find annotations that are contained by another annotation. The input is 2 Dataframes of AQAnnotations. We will call them A and B. The purpose is to find those annotations in A that are contained in B. What that means is the start/end offset for an annotation from A must be contained by the start/end offset from an annotation in B. We of course have to also match on the document id. We ultimately return the contained annotations (A) that meet this criteria. There is also the option of negating the query (think Not Contains) so that we return only A where it is not contained in B.
59
59
60
+
**ContainedInList** - Provide the ability to find annotations that are contained by another annotation. The input is 2 Dataframes of AQAnnotations. We will call them A and B. The purpose is to find those annotations in A that are contained in B. What that means is the start/end offset for an annotation from A must be contained by the start/end offset from an annotation in B. We of course have to also match on the document id. We ultimately return a Dataframe with 2 fields where the first field is an annotation from B and the second field is an array of entries from A that are contained in the first entry.
61
+
60
62
**Before** - Provide the ability to find annotations that are before another annotation. The input is 2 Dataframes of AQAnnotations. We will call them A and B. The purpose is to find those annotations in A that are before B. What that means is the end offset for an annotation from A must be before the start offset from an annotation in B. We of course have to also match on the document id. We ultimately return the A annotations that meet this criteria. A distance operator can also be optionally specified. This would require an A annotation (endOffset) to occur n characters (or less) before the B annotation (startOffset). There is also the option of negating the query (think Not Before) so that we return only A where it is not before B.
61
63
62
64
**After** - Provide the ability to find annotations that are after another annotation. The input is 2 Dataframes of AQAnnotations. We will call them A and B. The purpose is to find those annotations in A that are after B. What that means is the start offset for an annotation from A must be after the end offset from an annotation in B. We of course have to also match on the document id. We ultimately return the A annotations that meet this criteria. A distance operator can also be optionally specified. This would require an A annotation (startOffset) to occur n characters (or less) after the B annotation (endOffset). There is also the option of negating the query (think Not After) so that we return only A where it is not after B.
@@ -75,6 +77,9 @@ The following functions are currently provided by AnnotationQuery. Since functio
75
77
76
78
**Following** - Return the following sibling annotations for every annotation in the anchor Dataframe[AQAnnotations]. The following sibling annotations can optionally be required to be contained in a container Dataframe[AQAnnotations]. The return type of this function is different from other functions. Instead of returning a Dataframe[AQAnnotation] this function returns a Dataframe[(AQAnnotation,Array[AQAnnotation])].
77
79
80
+
**TokensSpan** - Provides the ability to create a string from a list of tokens that are contained in a span. The specified tokenProperty is used to extract the values from the tokens when creating the string. For SCNLP, this tokenProperty could be values like 'orig', 'lemma', or 'pos'. The spans would typically be a SCNLP 'sentence' or could even be things like an OM 'ce:para'. Returns a Dataframe[AQAnnotation] spans with 3 new properties all prefixed with the specified tokenProperty value followed by (ToksStr, ToksSpos, ToksEpos) The ToksStr property will be the concatenated string of token property values contained in the span. The ToksSPos and ToksEpos are properties that will help us determine the start/end offset for each of the individual tokens in the ToksStr. These helper properties are needed for the function RegexTokensSpan so we can generate accurate accurate start/end offsets based on the str file.
81
+
82
+
**RegexTokensSpan** - Provides the ability to apply a regular expression to the concatenated string generated by TokensSpan. For the strings matching the regex, a Dataframe[AQAnnotations] will be returned. The AQAnnotation will correspond to the offsets within the concatenated string containing the match.
0 commit comments