120:
Content from different sources may have many different formats or document types, such as XML, HTML, Office document formats or plain text. The content processing phase processes the incoming documents to plain text using document filters. It is also often necessary to normalize content in various
99:
Content awareness (or "content collection") is usually either a push or pull model. In the push model, a source system is integrated with the search engine in such a way that it connects to it and pushes new content directly to its
200:
The processed query is then compared to the stored index, and the search system returns results (or "hits") referencing source documents that match. Some systems are able to present the document as it was indexed.
172:, which is optimized for quick lookups without storing the full text of the document. The index may contain the dictionary of all unique words in the corpus as well as information about ranking and
160:
which is the basic matching unit. It is also common to normalize tokens to lower case to provide case-insensitive search, as well as to normalize accents to provide better recall.
361:
104:. This model is used when real-time indexing is important. In the pull model, the software gathers content from sources using a connector such as a
381:
347:
126:
153:
316:
122:
225:
41:
content. The search is generally offered only to users internal to the company. Enterprise search can be contrasted with
112:
connector. The connector typically polls the source with certain intervals to look for new, updated or deleted content.
210:
76:
in their collections. Enterprise search systems also use access controls to enforce a security policy on their users.
386:
61:
91:
In an enterprise search system, content goes through various phases from source repository to search results:
245:
235:
220:
169:
188:
to the system. The query consists of any terms the user enters as well as navigational actions such as
240:
215:
142:
73:
284:
185:
157:
255:
80:
230:
189:
173:
146:
46:
52:
Enterprise search systems index data and documents from a variety of sources such as:
375:
317:"The New Face of Enterprise Search: Bridging Structured and Unstructured Information"
134:
53:
33:
is software technology for searching data sources internal to a company, typically
323:
17:
302:
250:
105:
42:
69:
57:
348:"Security Requirements to Enterprise Search: part 1 - New Idea Engineering"
130:
109:
38:
34:
288:
138:
49:, which applies search technology to the content on a single computer.
65:
45:, which applies search technology to documents on the open web, and
275:
Kruschwitz, Udo; Hull, Charlie (2017). "Searching the
Enterprise".
101:
72:. Many enterprise search systems integrate structured and
362:"Understanding Content Collection and Indexing"
277:Foundations and Trends in Information Retrieval
8:
79:Enterprise search can be seen as a type of
87:Components of an enterprise search system
267:
156:is applied to split the content into
7:
184:Using a web page, the user issues a
152:As part of processing and analysis,
168:The resulting text is stored in an
25:
116:Content processing and analysis
27:Software for finding documents
1:
226:Enterprise information access
382:Information retrieval genres
303:"What is Enterprise Search?"
211:Collaborative search engine
62:document management systems
403:
192:and paging information.
246:List of search engines
236:Information extraction
221:Enterprise bookmarking
241:Knowledge management
216:Data defined storage
129:. These may include
289:10.1561/1500000053
83:of an enterprise.
387:Enterprise search
143:entity extraction
95:Content awareness
74:unstructured data
31:Enterprise search
18:Enterprise Search
16:(Redirected from
394:
366:
365:
358:
352:
351:
344:
338:
337:
335:
334:
328:
322:. Archived from
321:
313:
307:
306:
299:
293:
292:
272:
180:Query processing
121:ways to improve
21:
402:
401:
397:
396:
395:
393:
392:
391:
372:
371:
370:
369:
360:
359:
355:
346:
345:
341:
332:
330:
326:
319:
315:
314:
310:
301:
300:
296:
274:
273:
269:
264:
256:Vertical search
207:
198:
182:
166:
118:
97:
89:
81:vertical search
28:
23:
22:
15:
12:
11:
5:
400:
398:
390:
389:
384:
374:
373:
368:
367:
353:
339:
308:
294:
266:
265:
263:
260:
259:
258:
253:
248:
243:
238:
233:
231:Faceted search
228:
223:
218:
213:
206:
203:
197:
194:
181:
178:
174:term frequency
165:
162:
147:part of speech
117:
114:
96:
93:
88:
85:
47:desktop search
26:
24:
14:
13:
10:
9:
6:
4:
3:
2:
399:
388:
385:
383:
380:
379:
377:
363:
357:
354:
349:
343:
340:
329:on 2015-10-28
325:
318:
312:
309:
304:
298:
295:
290:
286:
282:
278:
271:
268:
261:
257:
254:
252:
249:
247:
244:
242:
239:
237:
234:
232:
229:
227:
224:
222:
219:
217:
214:
212:
209:
208:
204:
202:
195:
193:
191:
187:
179:
177:
175:
171:
163:
161:
159:
155:
150:
148:
144:
140:
136:
135:lemmatization
132:
128:
124:
115:
113:
111:
107:
103:
94:
92:
86:
84:
82:
77:
75:
71:
67:
63:
59:
55:
50:
48:
44:
40:
36:
32:
19:
356:
342:
331:. Retrieved
324:the original
311:
297:
280:
276:
270:
199:
183:
167:
154:tokenization
151:
119:
98:
90:
78:
54:file systems
51:
30:
29:
251:Text mining
141:expansion,
106:web crawler
376:Categories
333:2013-05-27
262:References
43:web search
283:: 1–142.
149:tagging.
127:precision
70:databases
58:intranets
205:See also
196:Matching
190:faceting
164:Indexing
131:stemming
110:database
39:database
35:intranet
139:synonym
158:tokens
123:recall
68:, and
66:e-mail
327:(PDF)
320:(PDF)
186:query
170:index
108:or a
102:APIs
37:and
285:doi
125:or
378::
281:11
279:.
176:.
145:,
137:,
133:,
64:,
60:,
56:,
364:.
350:.
336:.
305:.
291:.
287::
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.