TPC-E – Raising the Bar in OLTP Performance
Published Mar 23 2019 11:04 AM 907 Views
Microsoft
First published on MSDN on Oct 23, 2008

Glenn Paulley, Director of Engineering at Sybase iAnywhere, posted a commentary titled “ The State of TPC-E ” on his blog three weeks ago (10/3/08).  A better title would have been “All TPC-E Results Are On Microsoft SQL Server.  Why?”  Mr. Paulley takes issue with Brian Moran’s statement that “ the most rational answer is that Oracle and IBM have tried to top Microsoft’s numbers and simply can’t ”.  He says that while it may be true, he doubts it and says there are other plausible reasons why DB2 and Oracle have yet to publish any TPC-E results.  Curiously, he doesn’t say why Sybase hasn’t published TPC-E results.  Since he is, presumably, in a position to know, one can only conclude that he would rather not say.  Readers can reach their own conclusions about what that might mean.



To his credit, he cites this IBM whitepaper for explaining that TPC-E was designed to be more realistic than TPC-C.  There are numerous ways, detailed in the whitepaper, in which TPC-E is far superior to TPC-C.  Let’s compare TPC-E to TPC-C.  As the table below shows, in TPC-E the schema is substantially richer and more complex, there are twice as many transactions, and only TPC-E requires essential capabilities such as referential integrity and RAID protected storage.





TPC-C


TPC-E


Schema




Number of database tables


9


33


Foreign keys


9


50


Tables with foreign keys


7


27


Check constraints


0


22


Partitioning Characteristic


unrealistic; single dimension common


to 8 of 9 tables


realistic;


two independent dimensions


Transactions




Number of transactions


5


10


Database roundtrips per transaction


1


min 1; max 5


Capabilities




Referential Integrity Required


No


Yes


Storage Protection (e.g. RAID) for Database Required


Log Only


Everything


Timed Database Recovery test


No


Yes





Mr. Paulley chooses to focus on the query complexity of TPC-E.  While that’s somewhat interesting, a comparison to TPC-C would have provided important context.  For example, TPC-E has 156 DML statements.    Although TPC-C doesn’t include pseudo-SQL the way that TPC-E does, if it did and followed the TPC-E style, it would be fewer than 30 DML statements.  By this measure, TPC-E has more than five times as many distinct DML operations as TPC-C.



But more importantly, TPC-E is not and was never intended to be a query optimizer test.  The pseudo-SQL code in TPC-E is an example , not a requirement.  Unlike TPC-H which strictly limits changing the specified SQL, in TPC-E test sponsors are free to rewrite the SQL anyway they like as long as it is functionally equivalent.  One vendor might rewrite it to remove all joins while another might rewrite it to include more joins or more complex joins.  The same is true of group by and order by clauses.  In our view, Mr. Paulley’s objection that TPC-E isn’t a good optimizer test is misplaced.



After discussing query complexity, Mr. Paulley offers four reasons why Microsoft is the only database vendor publishing TPC-E results .



·         “TPC-E is a moving target” – While it’s true that the TPC-E spec is up to version 1.6.0, the assertion that the workload has changed significantly is unsupported by the facts.  None of the transactions has changed in any way that impacts performance.  All spec revisions have been classified as “minor” changes by the TPC and results across all spec revisions are comparable.  The number of revisions to the spec since it was first released actually reflects a deep commitment by the members of the TPC-E committee to clean up rough edges and address areas of ambiguity before they become issues in published results.  A better gauge of the high quality of the TPC-E spec is that to-date 18 results have been published by six vendors spanning 15 months, but there have been no compliance challenges .



·         “ Both DBMS vendors and hardware suppliers have a substantial investment in TPC-C expertise. ”  On this point we agree with Mr. Paulley.  But we draw different conclusions.  All of the major DBMS companies have spent years picking through every detail of TPC-C.  It has been optimized to such a degree that it long ago stopped driving customer-relevant engineering improvements.  TPC-C is 16 years old and has changed little since 1992.  Saying that we should continue using TPC-C because we know it so well is like saying that we should drive horse and buggies because we have a lot of expertise in blacksmithing.  This is a mindset trapped in the past and doesn’t serve our customers.



·         “ TPC-E isn’t that cheap. ”  In fact, TPC-E is substantially less expensive to configure and run than TPC-C.  Two results from IBM within the last month prove the point.  As you can see in the table below, running on the same server, the TPC-C configuration was more than five times more expensive than the TPC-E configuration.  Further, on the four proc server, the TPC-C result had 1361 disks with no data protection, while the TPC-E result had 400 disks with RAID-5.  Which is the more customer-relevant configuration?






TPC-C


TPC-E


Hardware


IBM System x3850 M2


IBM System x3850 M2


Procs / Cores / Threads


4 / 24 / 24


4 / 24 / 24


Performance


684,508 tpmC


729 tpsE


Price/perf


2.58 $/tpmC


457 $/tpsE


Total System Cost


$ 1,763,438


$ 333,646


Publication Date


9/15/08


9/15/08


Availability Date


10/31/08


10/10/08


Memory


256 GB


128 GB


Storage


1,344 x 73.4GB disks


16 x 500GB disks
1 x 73GB


400 x 73.4GB disks


Data Storage Protection


None


RAID-5


TPC Result Details


Link


Link



·         “ Customers continue to desire and reference TPC-C results .”  Granted, TPC-C has stood the test of time.  But today it is outdated, over-optimized, and of questionable relevance.  Customers hold onto TPC-C because it is familiar and available, not because it is better.  Database vendors need to exercise leadership.  As Mr. Paulley says “ Microsoft is an early adopter of TPC-E ”.  At this point, though, the early adopter window has passed.  TPC-E was ratified 20 months ago.  The first result was published 15 months ago.  There are 18 published results.   We believe that customers will readily embrace TPC-E as a superior benchmark as more results become available.



The more time that goes by, the more one is inclined to believe that Brian Moran is right – other database vendors aren’t publishing because they can’t beat the existing SQL Server results.  We invite Sybase and Mr. Paulley to prove us wrong.  We are confident that once Sybase runs TPC-E instead of just writing about it, Mr. Paulley will gain a new appreciation for just how challenging and technically rigorous TPC-E is compared with TPC-C.



Charles Levine


SQL Server Performance Engineering

1 Comment
Version history
Last update:
‎Mar 23 2019 11:04 AM
Updated by: