Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Home
- :
- SQL Server
- :
- SQL Server
- :
- Public Preview of Shortest Path on SQL Server 2019

By

Published
06-26-2019 02:14 AM
8,541
Views

Jun 26 2019
02:14 AM

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

Jun 26 2019
02:14 AM

SQL Server 2017 and Azure SQL Database introduced native graph database capabilities used to model many-to-many relationships. The first implementation of SQL Graph introduced support for nodes to represent entities, edges to represent relationships and a new MATCH predicate to support graph pattern matching and traversal.

We are expanding the graph database capabilities with several new features. In this blog we discuss one of these features that is now available for public preview in SQL Server 2019, **Shortest Path**, which can be used to find a shortest path between two nodes in a graph. The shortest path function can also be used to compute a transitive closure or for arbitrary length traversals.

With CTP3.1, we are introducing a new function SHORTEST_PATH, which can be used inside MATCH to find a shortest path between any 2 nodes in a graph or to perform arbitrary length traversals. Users can specify a pattern they would like to search for in the graph using a regular expression style syntax.

To understand the new syntax and how shortest path works, from Adventure Works Cycles, a bicycle wholesaler that sells and manufactures bicycles. We have converted their bill of materials hierarchy into a graph, which has Product node and IsPartOf edge table. Download sample scripts here.

Now, let’s look at some sample queries to see how shortest path can help with recursive or arbitrary length traversals in a graph schema.

**Find all assemblies where a given product is used**

Imagine that the inventory of a given product (Bearing Ball) at Adventure Works Cycles is running low and that they would like to find all the assemblies in their manufacturing pipeline, where a Bearing Ball is used. This will tell them what assemblies will be affected due to the scarcity of a given product.

The following query demonstrates how shortest path can be used to find all assemblies that require a Bearing Ball in the bill of materials:

-- Find all the assemblies where Bearing Ball is used SELECT P1.ProductID, P1.Name,STRING_AGG(P2.Name,'->')WITHIN GROUP (GRAPH PATH)AS [Assembly],LAST_VALUE(P2.ProductID)WITHIN GROUP (GRAPH PATH)AS [Final ProductID] FROM PRODUCT P1, PRODUCT FOR PATH P2, ISPARTOF FOR PATH IPO WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+)) AND P1.ProductID = 2 ORDER BY P1.ProductID

This query starts from ProductID = 2 (Bearing Ball) and keeps traversing up the BOM tree until it finds all products that contain a Bearing Ball.

Here are first few lines from the result set:

The STRING_AGG function on P2.Name fetches all the nodes that were traversed in this path, thereby, giving us a good view of assembly line up to ‘Touring-3000 Yellow, 58’ bike. Other rows in the result set show the other assemblies which have dependency on Bearing Ball at some level.

This query uses a lot of new syntax. Let’s understand what this new syntax does.

**SHORTEST_PATH**

The SHORTEST_PATH function lets you find:

- A shortest path between two given nodes/entities
- Single source shortest path(s).
- Shortest path from multiple source nodes to multiple target nodes.

It takes an arbitrary length pattern as input, described below. It returns a shortest path that exists between two nodes. This function can only be used inside MATCH. It accepts an arbitrary length pattern and finds a shortest path in the graph, which matches that pattern. If there exist two or more shortest paths of the same length between any pair of source and destination node(s), the function will return the one that was found first during traversal. Note that, an arbitrary length pattern can only be specified inside a SHORTEST_PATH function. Finding weighted shortest paths, all paths or all shortest paths is not supported yet.

**Arbitrary Length Pattern**

This pattern includes the nodes and edges that must be traversed repeatedly until the desired node is reached or until the maximum number of iterations as specified in the pattern is met. Each time the query is executed, the result of executing this pattern will be a collection of nodes and edges traversed along the path from the start node to the end node. This is a regular expression style syntax pattern and the following two pattern quantifiers are supported in this release:

**‘+’ ** : Repeat the pattern 1 or more times. Terminate as soon as a shortest path is found.

We recommend that users provide a start and an end node, along with the pattern, whenever ‘+’ is used in the pattern predicate. For a big graph, a query with ‘+’ and no terminating condition may take a long time to finish. Refer to the example shown in ‘Find Shortest path between 2 given nodes’ section below, to see how filters on start and end node can be applied.

**{1,n}** : Repeat the pattern 1 to ‘n’ times. Terminate as soon as a shortest is found or when the max number of iterations ‘n’ specified in the pattern is reached.

**FOR PATH**

FOR PATH must be used with any node or edge table name in the FROM clause, which will participate in an arbitrary length pattern. FOR PATH tells the engine that the node or edge table will return an ordered collection representing the list of nodes or edges along the path from the start node to the end node. The attributes from these tables cannot be projected directly in the SELECT clause and cannot be filtered directly in the WHERE clause. To project or filter attributes from these tables, **graph path aggregate functions** must be used.

** **

**Graph Path Aggregate Functions **

Since the nodes and edges involved in arbitrary length pattern return a collection (of node(s) and edge(s) traversed in that path), users cannot project the attributes directly using the conventional tablename.attributename syntax. For queries where it is required to project attribute values from the intermediate node or edge tables, in the path traversed, we have extended aggregate functions to work with graph node or edge tables. In this release, the following graph path aggregate functions are available: STRING_AGG, LAST_VALUE, SUM, AVG, MIN, MAX and COUNT. These functions can be used in SELECT or ORDER BY clause.

**Find all the products which are up to 3 levels away from a given product.**

Now, if Adventure Works Cycles wants to find products in their BOM, which are 1 to 3 levels away from Bearing Ball, they can just modify the query shown in example above as follows:

SELECT P1.ProductID, P1.Name, STRING_AGG(P2.Name,'->') WITHIN GROUP (GRAPH PATH) AS [Assembly], LAST_VALUE(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS [Final ProductID] FROM PRODUCT P1, PRODUCT FOR PATH P2, ISPARTOF FOR PATH IPO WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2){1,3})) AND P1.ProductID = 2 ORDER BY P1.ProductID

Here is a sample result set:

**Find Shortest Path between 2 given nodes**

The queries shown above, start from a given node (Bearing Ball) and then search all paths in the graph starting from that node, until they cannot find any other paths that originate from this node and end at other nodes.

Imagine that Adventure Works Cycles wants to find the shortest path between 2 given nodes in the assembly. They can use following query to do this. This query will again start at ProductID = 2 (Bearing Ball) but terminates as soon as it finds ProductID = 768 (bike ‘Road-650 Black, 44’).

SELECT ProductID, Name, [Assembly], FinalProductID, Levels FROM ( SELECT P1.ProductID, P1.Name, STRING_AGG(P2.Name,'->') WITHIN GROUP (GRAPH PATH) AS [Assembly], LAST_VALUE(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS FinalProductID, COUNT(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS Levels FROM PRODUCT P1, PRODUCT FOR PATH P2, ISPARTOF FOR PATH IPO WHERE MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+)) AND P1.ProductID = 2 ) AS Q WHERE Q.FinalProductID = 768

Note that in this query, we used a subquery to apply filter on the last node traversed in the path (FinalProductID = 768). The filter on the FinalProductID in the outer query is pushed down while computing the shortest path. Hence, the query will terminate as soon as the FinalProductID is found. LAST_VALUE() function, which fetches the attribute values from last node traversed, cannot be used in the WHERE clause, since it is a window aggregate function.

By using the COUNT graph path aggregate function in select clause, we can see the number of hops/levels between the 2 ProductIDs. Here is the result set of this query:

In this query, the predicate/filter on last node (Q.FinalProductID) is pushed down. That is, the shortest path query terminates as soon as it finds ProductID = 768.

Writing a shortest path query with a terminating condition on the last node, as shown in the query above, is highly recommended. This is a much more optimal query than the ones which start at one node and then traverse the entire graph to find all the connections that exist from that node to others. For huge graphs, finding the shortest path from a given start node to all the other nodes in the graph, will be a very expensive operation.

**Conclusion**

Starting with SQL Server 2019 CTP3.1, you can use SHORTEST_PATH function to compute shortest path(s) between nodes in a graph, using a regular expression style search pattern. SHORTEST_PATH can only be used inside a MATCH function and works only with graph node and edge tables. This function also allows you to write transitive closure or arbitrary length traversal query from a given ‘start’ node.

**Next Steps**

Please download and give this feature a try. We are looking forward to hearing your comments or feedback for this feature.

8 Comments

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.