Blog Post

SQL Server Integration Services (SSIS) Blog
4 MIN READ

Lookup cache modes

SSIS-Team's avatar
SSIS-Team
Copper Contributor
Mar 25, 2019
First published on MSDN on Oct 18, 2008

Over the past couple of months I've been putting together a presentation on the Lookup Transform. I presented most of it as a Chalk Talk at the MS BI Conference last week, and from the evaluation scores, it seems like it was pretty well received. I'll be splitting up some of its content into a series of blog posts over the next little while. If you're interested in seeing the whole talk, it will also be shown at the SSWUG Virtual Conference in November.

----

The most important setting of the Lookup Transform is the Cache Mode - it can greatly impact your data flow performance, and affects overall package design. Because of its importance, we made it the first thing you see in the new 2008 Lookup UI. I feel this is a great improvement over 2005, where the cache mode was abstracted away - see Michael Entin's post for more details.

This blog post describes the three cache modes, how they work, and best practices around using them. Note that these cache modes apply when you're using an OLE DB connection manager - using the new Cache connection manager is similar to using a Full Cache mode.

Full Cache

The default cache mode for the lookup is Full cache. In this mode, the database is queried once during the pre-execute phase of the data flow. The entire reference set is pulled into memory. This approach uses the most memory, and adds additional startup time for your data flow, as all of the caching takes place before any rows are read from the data flow source(s). The trade off is that the lookup operations will be very fast during execution. One thing to note is that the lookup will not swap memory out to disk, so your data flow will fail if you run out of memory.

When to use this cache mode

  • When you're accessing a large portion of your reference set
  • When you have a small reference table
  • When your database is remote or under heavy load, and you want to reduce the number of queries sent to the server

Keys to using this cache mode

  • Ensure that you have enough memory to fit your cache
  • Ensure that you don't need to pick up any changes made to the reference table
    • Since the lookup query is executed before the data flow begins, any changes made to the reference table during the data flow execution will not be reflected in the cache

Partial Cache

In this mode, the lookup cache starts off empty at the beginning of the data flow. When a new row comes in, the lookup transform checks its cache for the matching values. If no match is found, it queries the database. If the match is found at the database, the values are cached so they can be used the next time a matching row comes in.

Since no caching is done during the pre-execute phase, the startup time using a partial cache mode is less than it would be for a full cache. However, your lookup operations would be slower, as you will most likely be hitting the database more often.

When running in partial cache mode, you can configure the maximum size of the cache. This setting can be found on the Advanced Options page of the lookup UI. There are actually two separate values - one for 32bit execution, and one for 64bit. If the cache gets filled up, the lookup transform will start dropping the least seen rows from the cache to make room for the new ones.

In 2008 there is a new Miss Cache feature that allows you to allocate a certain percentage of your cache to remembering rows that had no match in the database. This is useful in a lot of situations, as it prevents the transform from querying the database multiple times for values that don't exist. However, there are cases where you don't want to remember the misses - for example, if your data flow is adding new rows to your reference table. The Miss Cache is disabled by default.

When to use this cache mode

  • When you're processing a small number of rows and it's not worth the time to charge the full cache
  • When you have a large reference table
  • When your data flow is adding new rows to your reference table
  • When you want to limit the size of your reference table by modifying query with parameters from the data flow

Keys to using this cache mode

No Cache

As the name implies, in this mode the lookup transform doesn't maintain a lookup cache (actually, not quite true - we keep the last match around, as the memory has already been allocated). In most situations, this means that you'll be hitting the database for every row.

When to use this cache mode

  • When you're processing a small number of rows
  • When you have non-repeating lookup indexes
  • When your reference table is changing (inserts, updates, deletes)
  • When you have severe memory limitations

Keys to using this cache mode

  • Ensure that the partial cache mode isn't the better choice

---

To find out more on how to implement the look up transform, please see these books online entries:

Updated Mar 25, 2019
Version 2.0
No CommentsBe the first to comment