55:(e.g., "Dante wrote the Divine Comedy"), represented in an amenable structure for computers . An OIE extraction normally consists of a relation and a set of arguments. For instance, ("Dante", "passed away in" "Ravenna") is a proposition formed by the relation "passed away in" and the arguments "Dante" and "Ravenna". The first argument is usually referred as the subject while the second is considered to be the object.
62:. Furthermore, the factual nature of the proposition has not yet been established. In the above example, transforming the extraction into a full fledged fact would first require linking, if possible, the relation and the arguments to a knowledge base. Second, the truth of the extraction would need to be determined. In computer science transforming OIE extractions into ontological facts is known as
101:
indicated that an OIE system should be able to extract non-verb mediated relations, which account for significant portion of the information expressed in natural language text. For instance, in the sentence "Obama, the former US president, was born in Hawaii", an OIE system should be able to recognize a proposition ("Obama", "is", "former US president").
105:
implies that to correctly recognize the set of propositions in an input sentence, it is necessary to understand its grammatical structure. The authors studied the case in the
English language that only admits seven clause types, meaning that the identification of each proposition only requires defining seven grammatical patterns.
88:. Other methods introduced later such as Reverb, OLLIE, ClausIE or CSD helped to shape the OIE task by characterizing some of its aspects. At a high level, all of these approaches make use of a set of patterns to generate the extractions. Depending on the particular approach, these patterns are either hand-crafted or learned.
116:
CSD introduced the idea of minimality in OIE. It considers that computers can make better use of the extractions if they are expressed in a compact way. This is especially important in sentences with subordinate clauses. In these cases, CSD suggests the generation of nested extractions. For example,
112:
Consider the sentence "Albert
Einstein was born in Ulm and died in Princeton". The first step will recognize the two propositions ("Albert Einstein", "was born", "in Ulm") and ("Albert Einstein", "died", "in Princeton"). Once the information has been correctly identified, the propositions can take
108:
The finding also established a separation between the recognition of the propositions and its materialization. In a first step, the proposition can be identified without any consideration of its final form, in a domain-independent and unsupervised way, mostly based on linguistic principles. In a
104:
ClausIE introduced the connection between grammatical clauses, propositions, and OIE extractions. The authors stated that as each grammatical clause expresses a proposition, each verb mediated proposition can be identified by solely recognizing the set of clauses expressed in each sentence. This
100:
OLLIE stressed two important aspects for OIE. First, it pointed to the lack of factuality of the propositions. For instance, in a sentence like "If John studies hard, he will pass the exam", it would be inaccurate to consider ("John", "will pass", "the exam") as a fact. Additionally, the authors
96:
Reverb suggested the necessity to produce meaningful relations to more accurately capture the information in the input text. For instance, given the sentence "Faust made a pact with the devil", it would be erroneous to just produce the extraction ("Faust", "made", "a pact") since it would not be
117:
consider the sentence "The
Embassy said that 6,700 Americans were in Pakistan". CSD generates two extractions ("6,700 Americans", "were", "in Pakistan") and ("The Embassy", "said", "that ). This is usually known as reification.
97:
adequately informative. A more precise extraction would be ("Faust", "made a pact with", "the devil"). Reverb also argued against the generation of overspecific relations.
77:. The extracted propositions can also be directly used for end-user applications such as structured search (e.g., retrieve all propositions with "Dante" as subject).
311:
109:
second step, the information can be represented according to the requirements of the underlying application, without conditioning the identification phase.
69:
In fact, OIE can be seen as the first step to a wide range of deeper text understanding tasks such as relation extraction, knowledge-base construction,
35:) is the task of generating a structured, machine-readable representation of the information in text, usually in the form of triples or n-ary
301:
306:
135:
58:
The extraction is said to be a textual representation of a potential fact because its elements are not linked to a
81:
249:
74:
63:
146:
199:
70:
173:
Banko, Michele; Cafarella, Michael; Soderland, Stephen; Broadhead, Matt; Etzioni, Oren (2007).
224:
136:"Methods for Open Information Extraction and Sense Disambiguation on Natural Language Text"
17:
159:
174:
59:
295:
277:
85:
48:
223:
Mausam; Schmitz, Michael; Soderland, Stephen; Bart, Robert; Etzioni, Oren (2012).
273:
36:
278:"Open Information Extraction via Contextual Sentence Decomposition"
52:
113:
the particular form required by the underlying application .
198:
Fader, Anthony; Soderland, Stephen; Etzioni, Oren (2011).
80:
OIE was first introduced by TextRunner developed at the
200:"Identifying relations for open information extraction"
250:"ClausIE: clause-based open information extraction"
225:"Open language learning for information extraction"
8:
248:Del Corro, Luciano; Gemulla, Rainer (2013).
175:"Open Information Extraction from the Web"
243:
241:
193:
191:
218:
216:
126:
155:
144:
51:, a textual expression of a potential
268:
266:
182:Conference on Artificial Intelligence
7:
312:Tasks of natural language processing
47:A proposition can be understood as
25:
27:In natural language processing,
1:
92:OIE systems and contributions
302:Natural language processing
276:; Haussmann, Elmar (2013).
29:open information extraction
18:Open Information Extraction
328:
307:Computational linguistics
84:Turing Center headed by
82:University of Washington
154:Cite journal requires
75:semantic role labeling
134:Del Corro, Luciano.
64:relation extraction
71:question answering
16:(Redirected from
319:
286:
285:
270:
261:
260:
254:
245:
236:
235:
229:
220:
211:
210:
204:
195:
186:
185:
179:
170:
164:
163:
157:
152:
150:
142:
140:
131:
21:
327:
326:
322:
321:
320:
318:
317:
316:
292:
291:
290:
289:
272:
271:
264:
252:
247:
246:
239:
227:
222:
221:
214:
202:
197:
196:
189:
177:
172:
171:
167:
153:
143:
138:
133:
132:
128:
123:
94:
45:
23:
22:
15:
12:
11:
5:
325:
323:
315:
314:
309:
304:
294:
293:
288:
287:
262:
237:
212:
187:
165:
156:|journal=
125:
124:
122:
119:
93:
90:
60:knowledge base
44:
41:
24:
14:
13:
10:
9:
6:
4:
3:
2:
324:
313:
310:
308:
305:
303:
300:
299:
297:
283:
279:
275:
269:
267:
263:
258:
251:
244:
242:
238:
233:
226:
219:
217:
213:
208:
201:
194:
192:
188:
183:
176:
169:
166:
161:
148:
137:
130:
127:
120:
118:
114:
110:
106:
102:
98:
91:
89:
87:
83:
78:
76:
72:
67:
65:
61:
56:
54:
50:
42:
40:
38:
34:
30:
19:
281:
274:Bast, Hannah
256:
231:
206:
181:
168:
147:cite journal
129:
115:
111:
107:
103:
99:
95:
86:Oren Etzioni
79:
68:
57:
49:truth-bearer
46:
37:propositions
32:
28:
26:
296:Categories
121:References
43:Overview
253:(PDF)
232:EMNLP
228:(PDF)
207:EMNLP
203:(PDF)
178:(PDF)
139:(PDF)
282:ICSC
160:help
53:fact
257:WWW
33:OIE
298::
280:.
265:^
255:.
240:^
230:.
215:^
205:.
190:^
180:.
151::
149:}}
145:{{
73:,
66:.
39:.
284:.
259:.
234:.
209:.
184:.
162:)
158:(
141:.
31:(
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.