How to find special characters in DB2?

I have a DB2 database containing millions of records. I found that some char () or varchar () fields contain special characters that should not be stored. I assume that the application received corrupted data or some code did it.

In any case, I want to find records that contain this broken data, which are special characters (not alphabetic).

I tried to find a way using the query, but could not. Does anyone know a good request or advice?

+5
source share
6 answers

You can use the DB2 TRANSLATE() function to highlight non-alphanumeric characters. Note that this will not work in Oracle compatibility mode, because in this case, DB2 will treat empty strings as NULL, as Oracle does.

 SELECT * FROM yourtable WHERE LENGTH(TRANSLATE( yourcolumn, '', -- empty string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789' )) > 0 -- after translating ASCII characters to empty strings -- there still something left 
+14
source

I know this is an older thread ... but after reading a ton ... it was my exact problem, and here is the solution I came up with to identify the lines of problems ... so that I can go in and fix them manually. FYI - the problem for me is happening because users copy / paste from Word into my application. Yes, I know that we need to fix this before saving ... but we have big fish to fry.

SELECT * FROM TABLE_A where ASCII(TRIM(TRANSLATE( COLUMN_A, ' ', -- empty string '()<>!;%$#* ?@ +&^=-":/''.,0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' ))) not in (10,64)

Some notes:

  • We are using iSeries DB2 and it works great
  • Be sure to keep all spaces in the translation function ... it requires 1 space forever, which you use
  • In the 3rd parameter of the translation function, there are two single quotes next to each other, and the first simply escapes the other (for those who may not know)
+7
source

You can use regex to get invalid characters. However, this process is very expensive because you have to read all the data and then process it.

To use regular expressions in DB2, you need to adapt the environment because this function is not available for SQL during the installation process. You have three options:

Once you have defined a regular expression to ignore valid characters (something like /[^a-zA-Z0-9]/ ), you can execute it in the database. Remember to retrieve another column where you can find the row (for example, the column identifier) ​​and then reschedule updates or delete to trim invalid characters.

If you do not know how to use the regular expression, here you have a good source of information: http://www.regular-expressions.info/ Specially http://www.regular-expressions.info/charclass.html

There is a regular expression question: Regular expressions in DB2 SQL

+2
source

If by special characters you mean non-printable characters, you can use this:

 select yourfield, hex(trim(yourfield)),TRANSLATE( yourfield, ' ', x'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F') from yourtable where yourfield <> TRANSLATE( yourfield, ' ', x'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F') 

You will probably see that your odd characters are HEX() 40.

In any case, if you know the HEX() your odd / special characters, you can use this approach and then replace them with a space or whatever you want forever:

 Update yourtable set yourfield= Translate(yourfield, ' ', x'000102030405060708090A0B0C0D0E101112131415161718191A1B1C1D1E202122232425262728292A2B2C2D2E303132333435363738393A3B3C3D3E0F1F2F3F') where yourfield <> Translate(yourfield, ' ', x'000102030405060708090A0B0C0D0E101112131415161718191A1B1C1D1E202122232425262728292A2B2C2D2E303132333435363738393A3B3C3D3E0F1F2F3F') 
+2
source

This query worked for me in the past in iSeries DB2.

 select * from db/table where posstr(field, x'3F') > 0 

The problem is that you have to be sure that you are looking for the hexadecimal value in the string. I had a similar situation where I was sure that the hexadecimal code for the character was x'3F, but when I adjusted the character without a visible image, it was actually x'22. You might want to highlight the character that gives you the problem and see what the meaning is.

 select hex(substr(field, 21,1)) from db/table where posstr(field, 'StringBeforeCharacter') > 0 
-1
source

You can use below SQL, which will work fine.

select col 1 from table where col1 like '%'|| chr(10) || '%';

-2
source

Source: https://habr.com/ru/post/1489622/


All Articles