Public Preview of Shortest Path on SQL Server 2019

Shreya Verma · ‎Jun 26 2019

SQL Server 2017 and Azure SQL Database introduced native graph database capabilities used to model many-to-many relationships. The first implementation of SQL Graph introduced support for nodes to represent entities, edges to represent relationships and a new MATCH predicate to support graph pattern matching and traversal.

We are expanding the graph database capabilities with several new features. In this blog we discuss one of these features that is now available for public preview in SQL Server 2019, Shortest Path, which can be used to find a shortest path between two nodes in a graph. The shortest path function can also be used to compute a transitive closure or for arbitrary length traversals.

With CTP3.1, we are introducing a new function SHORTEST_PATH, which can be used inside MATCH to find a shortest path between any 2 nodes in a graph or to perform arbitrary length traversals. Users can specify a pattern they would like to search for in the graph using a regular expression style syntax.

To understand the new syntax and how shortest path works, from Adventure Works Cycles, a bicycle wholesaler that sells and manufactures bicycles. We have converted their bill of materials hierarchy into a graph, which has Product node and IsPartOf edge table. Download sample scripts here.

Now, let’s look at some sample queries to see how shortest path can help with recursive or arbitrary length traversals in a graph schema.

Find all assemblies where a given product is used

Imagine that the inventory of a given product (Bearing Ball) at Adventure Works Cycles is running low and that they would like to find all the assemblies in their manufacturing pipeline, where a Bearing Ball is used. This will tell them what assemblies will be affected due to the scarcity of a given product.

The following query demonstrates how shortest path can be used to find all assemblies that require a Bearing Ball in the bill of materials:

-- Find all the assemblies where Bearing Ball is used
SELECT 
	P1.ProductID, 
	P1.Name,
	STRING_AGG(P2.Name,'->') WITHIN GROUP (GRAPH PATH) AS [Assembly],
	LAST_VALUE(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS [Final ProductID]
FROM
	PRODUCT P1,
	PRODUCT FOR PATH P2,
	ISPARTOF FOR PATH IPO
WHERE 
	MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+))
	AND P1.ProductID = 2
       ORDER BY P1.ProductID

This query starts from ProductID = 2 (Bearing Ball) and keeps traversing up the BOM tree until it finds all products that contain a Bearing Ball.

Here are first few lines from the result set:

The STRING_AGG function on P2.Name fetches all the nodes that were traversed in this path, thereby, giving us a good view of assembly line up to ‘Touring-3000 Yellow, 58’ bike. Other rows in the result set show the other assemblies which have dependency on Bearing Ball at some level.

This query uses a lot of new syntax. Let’s understand what this new syntax does.

SHORTEST_PATH

The SHORTEST_PATH function lets you find:

A shortest path between two given nodes/entities
Single source shortest path(s).
Shortest path from multiple source nodes to multiple target nodes.

It takes an arbitrary length pattern as input, described below. It returns a shortest path that exists between two nodes. This function can only be used inside MATCH. It accepts an arbitrary length pattern and finds a shortest path in the graph, which matches that pattern. If there exist two or more shortest paths of the same length between any pair of source and destination node(s), the function will return the one that was found first during traversal. Note that, an arbitrary length pattern can only be specified inside a SHORTEST_PATH function. Finding weighted shortest paths, all paths or all shortest paths is not supported yet.

Arbitrary Length Pattern

This pattern includes the nodes and edges that must be traversed repeatedly until the desired node is reached or until the maximum number of iterations as specified in the pattern is met. Each time the query is executed, the result of executing this pattern will be a collection of nodes and edges traversed along the path from the start node to the end node. This is a regular expression style syntax pattern and the following two pattern quantifiers are supported in this release:

‘+’ : Repeat the pattern 1 or more times. Terminate as soon as a shortest path is found.

We recommend that users provide a start and an end node, along with the pattern, whenever ‘+’ is used in the pattern predicate. For a big graph, a query with ‘+’ and no terminating condition may take a long time to finish. Refer to the example shown in ‘Find Shortest path between 2 given nodes’ section below, to see how filters on start and end node can be applied.

{1,n} : Repeat the pattern 1 to ‘n’ times. Terminate as soon as a shortest is found or when the max number of iterations ‘n’ specified in the pattern is reached.

FOR PATH

FOR PATH must be used with any node or edge table name in the FROM clause, which will participate in an arbitrary length pattern. FOR PATH tells the engine that the node or edge table will return an ordered collection representing the list of nodes or edges along the path from the start node to the end node. The attributes from these tables cannot be projected directly in the SELECT clause and cannot be filtered directly in the WHERE clause. To project or filter attributes from these tables, graph path aggregate functions must be used.

Graph Path Aggregate Functions

Since the nodes and edges involved in arbitrary length pattern return a collection (of node(s) and edge(s) traversed in that path), users cannot project the attributes directly using the conventional tablename.attributename syntax. For queries where it is required to project attribute values from the intermediate node or edge tables, in the path traversed, we have extended aggregate functions to work with graph node or edge tables. In this release, the following graph path aggregate functions are available: STRING_AGG, LAST_VALUE, SUM, AVG, MIN, MAX and COUNT. These functions can be used in SELECT or ORDER BY clause.

Find all the products which are up to 3 levels away from a given product.

Now, if Adventure Works Cycles wants to find products in their BOM, which are 1 to 3 levels away from Bearing Ball, they can just modify the query shown in example above as follows:

SELECT 
	P1.ProductID, 
	P1.Name,
	STRING_AGG(P2.Name,'->') WITHIN GROUP (GRAPH PATH) AS [Assembly],
	LAST_VALUE(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS [Final ProductID]
FROM
	PRODUCT P1,
	PRODUCT FOR PATH P2,
	ISPARTOF FOR PATH IPO
WHERE 
	MATCH(SHORTEST_PATH(P1(-(IPO)->P2){1,3}))
	AND P1.ProductID = 2
ORDER BY P1.ProductID

Here is a sample result set:

Find Shortest Path between 2 given nodes

The queries shown above, start from a given node (Bearing Ball) and then search all paths in the graph starting from that node, until they cannot find any other paths that originate from this node and end at other nodes.

Imagine that Adventure Works Cycles wants to find the shortest path between 2 given nodes in the assembly. They can use following query to do this. This query will again start at ProductID = 2 (Bearing Ball) but terminates as soon as it finds ProductID = 768 (bike ‘Road-650 Black, 44’).

SELECT
	ProductID, Name, [Assembly], FinalProductID, Levels
FROM (
	SELECT 
		P1.ProductID, 
		P1.Name,
		STRING_AGG(P2.Name,'->') WITHIN GROUP (GRAPH PATH) AS [Assembly],
		LAST_VALUE(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS FinalProductID,
		COUNT(P2.ProductID) WITHIN GROUP (GRAPH PATH) AS Levels
	FROM
		PRODUCT P1,
		PRODUCT FOR PATH P2,
		ISPARTOF FOR PATH IPO
	WHERE 
		MATCH(SHORTEST_PATH(P1(-(IPO)->P2)+))
		AND P1.ProductID = 2
 ) AS Q
 WHERE Q.FinalProductID = 768

Note that in this query, we used a subquery to apply filter on the last node traversed in the path (FinalProductID = 768). The filter on the FinalProductID in the outer query is pushed down while computing the shortest path. Hence, the query will terminate as soon as the FinalProductID is found. LAST_VALUE() function, which fetches the attribute values from last node traversed, cannot be used in the WHERE clause, since it is a window aggregate function.

By using the COUNT graph path aggregate function in select clause, we can see the number of hops/levels between the 2 ProductIDs. Here is the result set of this query:

In this query, the predicate/filter on last node (Q.FinalProductID) is pushed down. That is, the shortest path query terminates as soon as it finds ProductID = 768.

Writing a shortest path query with a terminating condition on the last node, as shown in the query above, is highly recommended. This is a much more optimal query than the ones which start at one node and then traverse the entire graph to find all the connections that exist from that node to others. For huge graphs, finding the shortest path from a given start node to all the other nodes in the graph, will be a very expensive operation.

Conclusion

Starting with SQL Server 2019 CTP3.1, you can use SHORTEST_PATH function to compute shortest path(s) between nodes in a graph, using a regular expression style search pattern. SHORTEST_PATH can only be used inside a MATCH function and works only with graph node and edge tables. This function also allows you to write transitive closure or arbitrary length traversal query from a given ‘start’ node.

Next Steps

Please download and give this feature a try. We are looking forward to hearing your comments or feedback for this feature.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Public Preview of Shortest Path on SQL Server 2019