Indexing question

Copper Contributor

Hello.

 

I have a table with about 700,000+ rows.

 

It's quite wide with some large columns. It contains product information.

 

The table has an auto-incrementing id column which is the PK

 

It also has a column called ManufacturerMarking which is nvarchar(100), not unique (but almost) and a column called [Supplier Code] which is integer, contains a 4 digit number also not unique but the combination of those two columns is definitely unique.

 

I have a view based on this table that calculates some stuff and joins with a suppliers table on the supplier code.

 

If I select the top 100 rows of this view the query is almost instant, which is great. However, the business wants to order by ManufacturerMarking. If I sort the table by that column the same query takes about 7 seconds (running on an Azure S4).

 

I'm currently leaving it up to Azure SQL to manage the non-clustered indexes. It's created a couple automatically which I can show here if needed.

 

The data is being surfaced in Power Apps and also via an API I have created in an Azure function. In Power Apps the users can filter by created date (datetime2), supplier name or via a search across a bunch of columns. These get passed in via the Power Platform SQL connector, which is a REST API that converts the Odata queries into a SELECT statement which performs quite well unless I try to send the order by clause with it.

The API doesn't allow sorting the data, but if I alter the view to sort the source table by ManufactureMarking, I see the same performance hit as if I send in the Order by clause via the Odata query from Power Apps.

 

What do I need to do with the indexes to improve the performance of this table/view when ordered by the ManufactureMarking? I looked into changing the clustered index of the table to ManufactureMarking but I'd have to drop the clustered index off the PK, and create one for this non-unique column. It all seems a bit daunting and maybe not advisable to do that. I don't want to accidentally drop or truncate the table (yes I do have backups but the outage would be embarrassing) or impact the performance in unexpected ways.

 

1 Reply

@williampage 

 

Hey, Will.

 

I'm not sure I've interpreted everything correctly, but I've got to start somewhere so here goes.

 

Firstly, if this table is part of a larger system then you most definitely don't want to mess around with the clustered index on the primary key. That will cause performance issues elsewhere - you can almost guarantee it since any foreign key references back to this table will be referring to the primary key.

 

If the table is more isolated, meaning few foreign key dependencies on its primary key and infrequently featuring in JOIN-oriented queries that use the primary key in the "ON" clause, then you may get away with changing the clustered key from the current primary key to the ManufacturerMarking, but I'd be very surprised if this is the case.

 

Just to be clear, dropping an index will not delete your data. It will probably kill performance until you re-create some meaningful replacement but you won't lose your data. I do prefer using T-SQL for DROP commands since you really have to not be paying attention to type something like "DROP TABLE" when you mean "DROP INDEX". Even if you type "DROP INDEX <yourTableName>", that's simply going to throw an error and fail. But if the GUI is your preferred approach, just be careful you're right-clicking the right thing!

 

That's the "easy" advice.

 

With respect to the rest, the first thing to check is that a non-clustered index exists for just the ManufacturerMarking column - no extra columns included (that's assuming you're not using a WHERE clause to filter the table's rows). I have to point out that I'm going with "non-clustered" as a best guess on how the wider system might hang together, as we only have a small window into the bigger picture here so far.

 

On the surface, what you've described does meet some of the criteria outlined here for a clustered index but as I just mentioned, if it truly is the beating heart of many relationships then I'd be willing to bet the data should be kept in the ordering it currently is. And this speaks to one key difference between clustered and non-clustered: clustered indexes result in the row data being shifted around so that each index key is stored sequentially.

 

Hypothetically, if nothing else has formal relationships with this table and you can use a clustered index, then dropping the existing clustered index and creating a new one against the ManufacturerMarking would mean:

 

  1. The data/rows wold be stored in alphabetical order based on the ManufacturerMarking column;
  2. You would no longer need an ORDER BY clause in your view definition since ordering would no longer be necessary.

But again - and I keep coming back to this - I'd be surprised if the table is really that isolated. Only you can figure that out through inspecting the dependencies (things like foreign keys, views, stored procedures, functions, etc.)

 

Getting back on track, if you have got an existing non-clustered index targeting only the ManufacturerMarking column, then there may well be not much more you can do - particularly as you said it's a very wide table (makes me wonder if you can't use a view to cut down the columns, which will help). But you can run the following superficial T-SQL check just to get some certainty around whether the bottleneck is definitely in SQL, or external to SQL.

 

If these two queries complete in roughly the same time, then your issue isn't the SQL instance at all.

 

Let the schema name be "dbo" and table name be "Products", since I don't know either (and it's not important that I do):

 

-- Query 1
SELECT COUNT(*) AS [Count] FROM dbo.Products AS [p] WITH (nolock)
GO

-- Query 2
SELECT COUNT(*) AS [Count] FROM dbo.Products AS [p] WITH (nolock) ORDER BY p.ManufacturerMarking

 

If query 2 still takes around 7 second then the bottleneck is definitely within the SQL instance, and at this stage based on what we know, I don't have any better suggestions other than the generalised indexing overview above.