DOCUMENT Magazine - April 2008 - (Page 30) in this operation. The procedure employs the same database that was used to generate the printed addresses to begin with. Some argue that table-based validation is only useful when the database is relatively small (and the client population of an insurance company is small compared to the entire US population), but the USPS would disagree. The USPS uses the ICR-based, Remote Computer Reader (RCR) system to handle rejected characters from its letter-sorting equipment. Since these characters tend to be problem characters, RCR typically only receives the worst possible images for recognition. RCR uses the massive USPS database containing the name and address for every person and company in the country. The ICR results are compared against this huge database to eliminate millions of characters for manual review. While large databases like RCR are expensive, their usefulness (in terms of labor savings, time savings and increased productivity) can easily outweigh the costs. can result in delays in processing a customer order. Most systems just capture the characters without any sort of validation other than limiting the field to digits. What appears to be a random sequence of numbers is actually a simple algorithm that can eliminate most ICR errors. The Modulus 10 algorithm is a check sum routine that is very easy to add to applications, such as order processing. Modern credit cards have 16 digits. A few older credit cards still have only 13 digits. The first four digits correspond to the issuing bank. The next two digits specify the company program the owner is enrolled in (such as gold cards). VISA and MasterCard process enough cards that this is useful, but most ICR applications can make little use of that information. The last digit of the credit card is the check digit. The rest of the numbers are processed through the algorithm and must result in the correct check digit. If the sequence matches, the credit card is valid. Otherwise, the card is phony — or the ICR engine made a mistake. The algorithm works like this. Every number except the check digit is multiplied by one or two, alternating, and starting right to left with two. 2 Data/Range Checks Nearly every system includes data / range checks. In these situations, the field or character information is compared with a model to see that it conforms to a specific type, such as a particular alphanumeric sequence or date — like the date of purchase on a bill of sale used to support a title guarantee application. A validation routine might also check to see that a number is within a specific range or contains a certain number of digits. Specifying the correct number of digits also aids character segmentation in ICR applications, particularly in handprint character recognition. Relationship validation is probably the least commonly used validation type, but the one that, in all likelihood, yields the greatest improvements in recognition performance, especially when combined with a redesign of the form that clarifies the data being validated. A simple, effective relationship test is to see if a column of numbers yields the correct sum when their recognition results are totaled and compared to the recognition results of the total. If there is a match, then the probability is extremely high that all the numbers are recognized correctly. On forms that report items and their values in property insurance — for example, a parking lot full of cars or a list of covered inventory — the item / price / total relationship is a validation routine that can yield spectacular results. Surprisingly, there are certain types of relationship validation routines that are actually used more often in full text ICR systems than in forms processing. In actuality, they cross over into the realm of context analysis. Full text vendors know that specific letter combinations occur more frequently than others do, so they use them to improve raw ICR accuracy in order to facilitate word recognition. “U”, for example, will almost always follow the letter “Q”. The validation routines will validate “QVICK” as inferior to “QUICK” since the relationship for “Q” is stronger with the “U” than the “V”. Along these lines, ICR and full text engines alike use letter combinations called trigrams to improve recognition accuracy. For instance, the trigram “THQ” is found in only one word in the English language: earthquake. Check Digit Algorithm Card No. 4 2 0 6 Multiplier 2 1 2 1 Product 8206 3000 2121 6000 1634 2121 2664 9688 212 18 6 16 3 Relationship Validation Remember, the last eight in the card number is not included in the calculation. Notice that in some cases the result is two digits (16, and 18 in this example). Next, add up all the digits in the answer (18 counts as 1+8=9, not 18). This total is 62. The check digit should equal the next highest power of 10 minus the total, for example, 70 – 62 = 8. Since eight is the last digit on the credit card, this sequence is a valid credit card. One way to use this information is to take the best guess or highest confidence characters in each position and test the second best guess characters until the sequence passes. Since this is an integerbased algorithm, many calculations can be performed in parallel very quickly. Often, powerful validation routines, such as this one, can be easily implemented and yield tremendous benefits for overall system performance. Arthur Gingrande, ICP, is co-founder and partner of IMERGE Consulting), a document-centric management consulting firm. Mr. Gingrande is a nationally recognized expert in document recognition technology. For more information, email arthur@imergeconsult.com or call 781-2588181. Paul Traite, ICP, is co-founder and CTO of AliusDoc LLC. AliusDoc offers mathResults for Recognition (pat. pending), vendor-independent, business rule add-on software based upon statistical analyses of recognition engine performance. Mr. Traite can be reached at 781-267-5264 or email him at paultraite@aliusdoc.com. ■ Check Sum Digits Check sum digits can add considerable force to ensuring accuracy in applications where strings of numeric data are involved. Take, for example, VISA and MasterCard validation in credit card form processing applications. Credit card numbers are critical since each error 30 document april.08 www.DOCUMENTmedia.com http://www.DOCUMENTmedia.com
Table of Contents Feed for the Digital Edition of Document Magazine - April 2008 Document Magazine - April 2008 Contents Editor's View The Research Desk The Response Center BPM: Improving the Way You Process Contributing Writers Mapping Out Performance Build the Context Before You Move into the House of ECM Taking On the Big 3 The Human Connection Addressing Your Addresses Don't Call Us, We'll Call You The Mulitplying Image Recognizing Accuracy New Products Calendar Advertisers Document Magazine - April 2008 Document Magazine - April 2008 - Document Magazine - April 2008 (Page 1) Document Magazine - April 2008 - Document Magazine - April 2008 (Page 2) Document Magazine - April 2008 - Document Magazine - April 2008 (Page 3) Document Magazine - April 2008 - Contents (Page 4) Document Magazine - April 2008 - Editor's View (Page 5) Document Magazine - April 2008 - The Response Center (Page 6) Document Magazine - April 2008 - Contributing Writers (Page 7) Document Magazine - April 2008 - Mapping Out Performance (Page 8) Document Magazine - April 2008 - Mapping Out Performance (Page 9) Document Magazine - April 2008 - Mapping Out Performance (Page 10) Document Magazine - April 2008 - Build the Context Before You Move into the House of ECM (Page 11) Document Magazine - April 2008 - Build the Context Before You Move into the House of ECM (Page 12) Document Magazine - April 2008 - Build the Context Before You Move into the House of ECM (Page 13) Document Magazine - April 2008 - Taking On the Big 3 (Page 14) Document Magazine - April 2008 - Taking On the Big 3 (Page 15) Document Magazine - April 2008 - Taking On the Big 3 (Page 16) Document Magazine - April 2008 - Taking On the Big 3 (Page 17) Document Magazine - April 2008 - The Human Connection (Page 18) Document Magazine - April 2008 - The Human Connection (Page 19) Document Magazine - April 2008 - Addressing Your Addresses (Page 20) Document Magazine - April 2008 - Addressing Your Addresses (Page 21) Document Magazine - April 2008 - Addressing Your Addresses (Page 22) Document Magazine - April 2008 - Addressing Your Addresses (Page 23) Document Magazine - April 2008 - Don't Call Us, We'll Call You (Page 24) Document Magazine - April 2008 - Don't Call Us, We'll Call You (Page 25) Document Magazine - April 2008 - The Mulitplying Image (Page 26) Document Magazine - April 2008 - The Mulitplying Image (Page 27) Document Magazine - April 2008 - Recognizing Accuracy (Page 28) Document Magazine - April 2008 - Recognizing Accuracy (Page 29) Document Magazine - April 2008 - Recognizing Accuracy (Page 30) Document Magazine - April 2008 - Recognizing Accuracy (Page 31) Document Magazine - April 2008 - Calendar (Page 32) Document Magazine - April 2008 - Advertisers (Page 33) Document Magazine - April 2008 - Advertisers (Page 34) Document Magazine - April 2008 - Advertisers (Page 35) Document Magazine - April 2008 - Advertisers (Page 36)
For optimal viewing of this digital publication, please enable JavaScript and then refresh the page. If you would like to try to load the digital publication without using Flash Player detection, please click here.