DOCUMENT Magazine - April 2008 - (Page 28) A I ARCHIVING & IMAGING & w ith millions of fields of data to read within a transactional enterprise, accuracy of that data is vital. The utilization of validation procedures and statistical analysis can lend a helping hand to your recognition engines to process more accurately. Recognizing Accuracy a Improve document recognition with context and statistical analysis By Arthur Gingrande and Paul Traite Any method available to a user that can inform a recognition system about the kind of documents to which it will be exposed in advance of the classification process will produce a bias toward improved recognition performance. Today’s recognition engines are used to classify many types of data, including machine print, handprint, barcodes, cursive writing, check boxes, fill-in ovals and “pay this amount” fields on checks, both numeric and legal. For purposes of discussion, although OCR and ICR data dominate the examples in this article, most of the referred-to techniques can be used to improve recognition of the remaining data types. document processing systems prior to recognition in order to make the results match the data format of the form being recognized. For example, using a menu of different options, an end user may be given a choice of applying various grammatical rules, such as “I before E except after C,” or “look for the letter U after the letter Q,” in order to improve recognition of words in an open field on a survey form. Another device that improves accuracy is the use of various edit masks to a field that informs the ICR engine of the alphanumeric syntax of the field. For example, if a six-digit product code is always composed of an initial three alphabetic characters followed by two numbers and a final alphabetic character, then the edit mask “AAANNA” may be applied to the field. This will prohibit the letter “O” from being confused with the number zero, or the number one from being mistaken for the letter “l” or the letter “S” from being confused with the number five, to name some of the most common substitution errors. Collections of different edit masks, such as time fields, date fields, social security numbers and product codes, can be created by the user and stored in memory for later use when setting up a form for recognition. By applying a user-defined, application-specific dictionary of words to a given field, accuracy will be improved. Multiple dictionaries can be created and stored for association with different fields on a form. For example, a dictionary of patient names might be applied against the “name” field on a medical claim submitted to an insurance company. A dictionary of medical procedures might be applied against the “procedure” field, and so forth. However, an ordinary English dictionary, Applying the Techniques When recognition software is used to process structured forms (in which each field position is completely predefined), many of the following techniques can be applied very early in the process, often by the recognition engine itself. The same techniques can be used when processing semi-structured or unstructured forms, but they may have to be used on a post-recognition basis, i.e., after the field data has been located and extracted from the form. Context Analysis Context analysis is the most elementary and popular way of improving document recognition accuracy. It involves programming grammatical and lexical rules, edit masks and dictionaries into the forms and 28 document april.08 www.DOCUMENTmedia.com http://www.DOCUMENTmedia.com
Table of Contents Feed for the Digital Edition of Document Magazine - April 2008 Document Magazine - April 2008 Contents Editor's View The Research Desk The Response Center BPM: Improving the Way You Process Contributing Writers Mapping Out Performance Build the Context Before You Move into the House of ECM Taking On the Big 3 The Human Connection Addressing Your Addresses Don't Call Us, We'll Call You The Mulitplying Image Recognizing Accuracy New Products Calendar Advertisers Document Magazine - April 2008 Document Magazine - April 2008 - Document Magazine - April 2008 (Page 1) Document Magazine - April 2008 - Document Magazine - April 2008 (Page 2) Document Magazine - April 2008 - Document Magazine - April 2008 (Page 3) Document Magazine - April 2008 - Contents (Page 4) Document Magazine - April 2008 - Editor's View (Page 5) Document Magazine - April 2008 - The Response Center (Page 6) Document Magazine - April 2008 - Contributing Writers (Page 7) Document Magazine - April 2008 - Mapping Out Performance (Page 8) Document Magazine - April 2008 - Mapping Out Performance (Page 9) Document Magazine - April 2008 - Mapping Out Performance (Page 10) Document Magazine - April 2008 - Build the Context Before You Move into the House of ECM (Page 11) Document Magazine - April 2008 - Build the Context Before You Move into the House of ECM (Page 12) Document Magazine - April 2008 - Build the Context Before You Move into the House of ECM (Page 13) Document Magazine - April 2008 - Taking On the Big 3 (Page 14) Document Magazine - April 2008 - Taking On the Big 3 (Page 15) Document Magazine - April 2008 - Taking On the Big 3 (Page 16) Document Magazine - April 2008 - Taking On the Big 3 (Page 17) Document Magazine - April 2008 - The Human Connection (Page 18) Document Magazine - April 2008 - The Human Connection (Page 19) Document Magazine - April 2008 - Addressing Your Addresses (Page 20) Document Magazine - April 2008 - Addressing Your Addresses (Page 21) Document Magazine - April 2008 - Addressing Your Addresses (Page 22) Document Magazine - April 2008 - Addressing Your Addresses (Page 23) Document Magazine - April 2008 - Don't Call Us, We'll Call You (Page 24) Document Magazine - April 2008 - Don't Call Us, We'll Call You (Page 25) Document Magazine - April 2008 - The Mulitplying Image (Page 26) Document Magazine - April 2008 - The Mulitplying Image (Page 27) Document Magazine - April 2008 - Recognizing Accuracy (Page 28) Document Magazine - April 2008 - Recognizing Accuracy (Page 29) Document Magazine - April 2008 - Recognizing Accuracy (Page 30) Document Magazine - April 2008 - Recognizing Accuracy (Page 31) Document Magazine - April 2008 - Calendar (Page 32) Document Magazine - April 2008 - Advertisers (Page 33) Document Magazine - April 2008 - Advertisers (Page 34) Document Magazine - April 2008 - Advertisers (Page 35) Document Magazine - April 2008 - Advertisers (Page 36)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.