Hi All... I'm working on an application where I have a 5-digit manufacturing code - each digit may be alpha or numeric. Each code that I receive should be in a table where I can find additional information - so basically, it could be as simple as:
SELECT * FROM MyTable WHERE MfgCode = 'XY12Z'
But my problem is that in the event the code is not in the table, I need to find the next best fit. Say it matches all except for one or two columns... My current thought is to break that column into 5 individual columns and then start generating some pretty horific SQL statements involving all 5 columns or 4 columns or 3 columns and all the combinations thereof... Such as
WHERE MfgCode1='X' AND MfgCode2='Y', etc...
That could get pretty messy pretty fast.... Anyone have any other thoughts?
Thanks for the help!! -- Curt
This is an interesting problem. Does the order of the mfgcode matter when doing the "almost" lookup? ie.. would it be considered a match if you run the query: SELECT * FROM MyTable WHERE MfgCode = 'XY12Z' doesn't produce any results, but the query SELECT * FROM MyTable WHERE MfgCode = 'YX12Z' does?|||Hi,
There a clean solution to build advanced sql searchs : using Regular expressions. Unfortunatly this feature is enabled only in SQL Server 2005.
Here a link about using Regular expressions with SQL Server 2005 and C#.
You will need to create a Regular expression Function that seems like :
Code Snippet
using System;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class RegExBase
{
[SqlFunction(IsDeterministic = true, IsPrecise = true)]
public static bool RegExMatch(string pattern, string matchString)
{
Regex r1 = new Regex(pattern.TrimEnd(null));
return Regex.Match(matchString.TrimEnd(null)).Success;
}
};
Then u can run this kind of queries :
SELECT * FROM MyTable WHERE dbo.RegExMatch('XY12?', MfgCode) = 1
For SQL Server 2000 this feature is not enabled by default. However, some free and commercial tools are available.
HTH.
Hayder Marzouk
|||Rather than breaking into columns, try breaking into rows. You'll
need a numbers table with 1 to 5 as below
CREATE TABLE Numbers(Num INT PRIMARY KEY)
INSERT INTO Numbers(Num) VALUES(1)
INSERT INTO Numbers(Num) VALUES(2)
INSERT INTO Numbers(Num) VALUES(3)
INSERT INTO Numbers(Num) VALUES(4)
INSERT INTO Numbers(Num) VALUES(5)
This query will give the number of matches at each of the 5 digits
DECLARE @.MfgCode CHAR(5)
SET @.MfgCode = 'XY12Z'
SELECT MfgCode,
SUM(CASE WHEN SUBSTRING(MfgCode,Num,1) = SUBSTRING(@.MfgCode,Num,1)
THEN 1 ELSE 0 END) AS NumberOfMatches
FROM MyTable
CROSS JOIN Numbers
WHERE Num BETWEEN 1 AND 5
GROUP BY MfgCode
HAVING SUM(CASE WHEN SUBSTRING(MfgCode,Num,1) = SUBSTRING(@.MfgCode,Num,1)
THEN 1 ELSE 0 END)>=3
ORDER by NumberOfMatches DESC
Wow... Thanks for the responses. I've got stuff to chew on now. : )
Tim, yes the order does matter. This "manufacturing code" is actually a VIN taken off of a motor vehicle - those 5 digits serve as a hash value to get to some detailed info describing the rest of the VIN. Often when transcribing this string off the vehicle, a digit or two is mis-read. For instance "6" and "G" may be sometimes confused. Also VIN's can't have the letter I (eye) - so it wont be confused with a 1 (one). But sometimes folks will write down and later enter the letter... Needless to say, my code's got to tolerate all of these human mistakes!
Thanks again, All. I really appreciate the help. -- Curt
|||I am kind of interested in this "almost" match. Does this mean that MfgCode has multiple meanings in the single value? It might be that the suggestion to break into multiple rows is the exact correct answer, but it really depends on the symantics of the column. Like does 5 characters mean five different things? Or do dome of them tie together?If five different things, then like this:
MyTable
=======
MyTableKey
Primary key (MyTableKey)
MyTableCode
==========
MyTableKey
MfgCode
Primary key (MyTableKey, MfgCode)
This will make your queries very straightforward to write, using a join and a count you can see what criteria matches the most closely by saying:
select myTableKey, count(*)
from myTable
join MyTableCode
on myTable.myTableKey = myTableCode.myTableKey
where myTableCode.MfgCode in ('X','Y','1','2','Z')|||
Hmmm...it sounds like it might be a "decent" idea to write yourself a function that performs this logic. This function will need some of the logic you just described in it to catch matches...and, in most likelihood, it won't be all that fast. This sample function may give you some ideas to get you started:
create function fn_fuzzyvalues
(
@.val1 varchar(10),
@.val2 varchar(10)
)
returns bit
as
begin
declare @.retval bit, @.maxlen tinyint, @.i tinyint
if @.val1 = @.val2
return(1)
set @.maxlen = case when len(@.val2) > len(@.val1) then len(@.val2) else len(@.val1) end
set @.i = 1
while(@.i <= @.maxlen)
begin
--logic
if substring(@.val1,@.i,1) = substring(@.val2,@.i,1)
--more logic to determine your match threshold
--if matches >= threshold then @.retval = 1 else 0
set @.i = @.i + 1
end
return(@.retval)
end
|||Louis, the digits are mutually exclusive. They really dont mean anything to each other. They can be alpha-numeric with some exceptions where certain characters are not allowed.
Thanks everyone for the help. I'm leaning toward Mark's approach where he suggested breaking into rows instead of columns... I'm certainly studying his approach right now. And as usual, I learn something new about SQL every time I come here...
Thanks again for the help, everyone! -- Curt
No comments:
Post a Comment