Software Practice and Experience}, 24(4), 337-362 (April 1994) Alden H. Wright Computer Science University of Montana Missoula, MT 59812 January 8, 1993 wright@cs.umt.eduSUMMARY:
Given a text string, a pattern string, and an integer k, the problem of approximate string matching with k differences is to find all substrings of the text string whose edit distance from the pattern string is less than k. The edit distance between two string is defined as the minimum number of differences, where a difference can be a substitution, insertion, or deletion of a single character. An implementation of the dynamic programming algorithm for this problem is given that packs several characters and mod-4 integers into a computer word. Thus, it is a parallelization of the conventional implementation that runs on ordinary processors. Since a small alphabet means that characters have short binary codes, the degree of parallelism is greatest for small alphabets and for processors with long words. For an alphabet of size 8 or smaller and a 64 bit processor, a 21-fold parallelism over the conventional algorithm can be obtained. Empirical comparisons to the basic dynamic programming algorithm, to a version of Ukkonen's algorithm, and to the algorithm of Galil and Park, are given.
KEY WORDS | Approximate string matching Approximate string searching Edit distance Within-word parallelism Parallel algorithm Parallel computation |