cancel
Showing results for 
Search instead for 
Did you mean: 

HVR - TrimWhiteSpace TableProperty - performance cost?

terryc_mufg
New Contributor III

Hi,

We are replicating between SQL Server and Snowflake using HVR 6 / LDP.

One consideration impacting reporting against both environments is the different behaviour when it comes to handling varchar data with trailing spaces.

SQL Server (intuitively..?)  trims/ignore trailing spaces when performing JOIN/WHERE clause operations.  Whereas, Snowflake is more strict on comparisons and will include SPACEs as valid data to compare etc.

The "right-thing" is to fix the data in source - but this will not a be quick task - and will impact deadlines etc.

Spotted the "TrimWhiteSpace" table property action - which we could apply on the TARGET (Snowflake) location for ALL tables etc.

Does anybody have experiences to share on this feature?   Initial concerns around performance and impact against integration would be my initial reaction..

Cheers

Terry

 

 

 

 

1 ACCEPTED SOLUTION

Hi Terry,

Every function will have a performance impact, but I believe this will have the lowest. Otherwise you can ofcourse add the functions to the join statement with probably has a higher impact. If you handle it during replication you have to pay the price only once.

Cheers,

Richard

View solution in original post

7 REPLIES 7

rikthefrog
Fivetranner
Fivetranner

Hi Terry,

What do you want to accomplish with the trimwhitespaces? Do you want to remove the trailing spaces for the compare functionality?
I would say it makes mostly your life easier. The other solution would be to create integrate and capture expressions. Which is more cumbersome and requires the concept of context.
Although I didn;t tested it I would say that the removal of trailing whitespaces will be minimal it is only a function on the value and will not force the database to do full table scans or ordering which are costly operations on a database.

Cheers,

Richard

terryc_mufg
New Contributor III

Hey Richard, thanks for response.

The main issue is that Snowflake reports are producing incorrect results with the same SQL statements run against SQL Server (source).     We are finding more examples as testing is progressing!

The reason - is trailing spaces in the data - as Snowflake will include spaces in comparisons.

We could try and change the Snowflake queries with TRIM commands etc - but that will impact query performance.

The 'TrimWhiteSpace' option looked like a good "catch all" option - but assume there will be a performance cost to pay.

Thanks

Terry

 

Hi Terry,

Every function will have a performance impact, but I believe this will have the lowest. Otherwise you can ofcourse add the functions to the join statement with probably has a higher impact. If you handle it during replication you have to pay the price only once.

Cheers,

Richard

terryc_mufg
New Contributor III

Hi,

Thought I would share my findings....  We ran some Refresh tests against a test table ( ~100 mil rows) loading into Snowflake.  We ran repeated refreshes, with settings  :-

  1. BASELINE :  No TrimWhiteSpace TableProperties Action   = 1640 sec ( 27 min , 20s)
  2. With TrimWhiteSpace TableProperties - SOURCE (All tables) = 1667 sec  ( 27min, 50s )
  3. With TrimWhiteSpace TableProperties - TARGET (All tables) = 17771 sec ( 28 min 31s)

    Checking the target data in Snowflake, we saw Trimmed data for Test #3 only, and Test#2 did not appear to change the data. 

    Based on my calcs, the above feature shows a 4% increase in overall refresh time.  Will need to test against different sized tables to see if this result is consistent .

Hope this is useful.

Cheers

Terry