Filestream and Full-Text - Full Solution for Document Indexing in SQL Server

Published Feb 10 2019 05:23 PM 1,589 Views
Microsoft

First published on MSDN on Oct 23, 2017

Updated: Oct 19, 2021


Recently someone was asking if there is a simple SQL Server solution where you would generate text documents and simply pass them to SQL Server to catalog without writing an application to do this. Also, would like to be able to index and search the context of these documents.

The solution in SQL Server would involve Filestream and Full-Text technologies. There are two alternatives:

    1. Use FileTable
    2. Use a regular table

 

1. Using a FileTable

1. Enable Filestream: Go to SQL Server Configuration Manager, properties on the SQL Server service and under FILESTREAM enable FILESTREAM for T-SQL and file I/O.

 

JosephPilov_0-1634692251505.png

 

2. Configure FILESTREAM in SQL Server

 

use master;
exec sp_configure 'filestream access level', 2

--AT THIS POINT, YOU NEED TO RESTART SQL SERVER

 

3. Restart SQL Server service 

 

4. Create a database with a FILESTREAM filegroup

 


DROP DATABASE FileTableDB_forTextDocs
go
CREATE DATABASE FileTableDB_forTextDocs
ON PRIMARY (
 NAME = N'FileTableDB',
 FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\FileTableFileTableDB.mdf' ),
 FILEGROUP FilestreamFG CONTAINS FILESTREAM (
 NAME = FileStreamGroup1,
 FILENAME= 'C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\FileTableData' )
LOG ON (
 NAME = N'FileTableDB_Log',
 FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL14.SQL2017\MSSQL\DATA\FileTableFileTableDB_log.ldf' )
 WITH FILESTREAM (
 NON_TRANSACTED_ACCESS = FULL,
 DIRECTORY_NAME = N'TextesFileTable')
go

 

5. Validate the database with directory name FILESTREAM options

 

SELECT DB_NAME ( database_id ), 
 directory_name, non_transacted_access, 
 non_transacted_access_desc
FROM sys.database_filestream_options
WHERE directory_name is NOT NULL
GO

 

6. Create a FileTable in the database

 

use FileTableDB_forTextDocs
GO
CREATE TABLE dbo.TextDocuments AS FileTable
 WITH (
  FileTable_Directory = 'TextesFileTable',
  FileTable_Collate_Filename = database_default);
GO

 

7. Validate the table is created successfully and check its contents (no rows)

 

SELECT * FROM sys.filetables;
GO
SELECT * FROM sys.tables WHERE is_filetable = 1;
GO
--check if there is something in there - NOTHING YET
SELECT * FROM dbo.TextDocuments
GO

 

8. Find the location of the filetable root path (DIRECTORY_NAME).

It will look like this: \\SERVERNAME\WINDOWS_SHARE\TextesFileTable\TextesFileTable

 

SELECT FileTableRootPath('dbo.TextDocuments');

 

9. Using the share you discovered with FileTableRootPath as a destination path, create 3 files. Run these from Command Prompt

 

echo 1. this is a some text1 > \\SERVERNAME\SQL2017\TextesFileTable\TextesFileTable\TextFile1.txt
echo 2. this is a some text2 > \\SERVERNAME\SQL2017\TextesFileTable\TextesFileTable\TextFile2.txt
echo 3. this is a some text3 > \\SERVERNAME\SQL2017\TextesFileTable\TextesFileTable\TextFile3.txt

 

10. IMPORTANT:  Make Sure You Close The Share Because Keeping It Open In Windows Explorer Can Cause Blocking

 

11. Now check the FileTable to see if the files are tracked - should see 3 records

 

SELECT * FROM dbo.TextDocuments

 

12. Now to create a Full-Text index on this table, first create the catalog

 

CREATE FULLTEXT CATALOG [FTCat1] WITH ACCENT_SENSITIVITY = ON

 

13. Get the name of the primary key (PK) from the table. The PK names will look something like PK__TextDocu__5A5B77D5760F9CB8. 

 

exec sp_helpindex TextDocuments

 

14. Create the FT index on the file_stream column

 

CREATE FULLTEXT INDEX ON [dbo].[TextDocuments] (file_stream TYPE COLUMN [name]) 
KEY INDEX [PK__TextDocum__5A5B77D58E492DCE] -- name from previous step
 ON ([FTCat1]) WITH (CHANGE_TRACKING AUTO)

 

15. Now search using Full-Text Contains() clause

 

SELECT * FROM dbo.TextDocuments
WHERE CONTAINS(file_stream, 'test')

 

2. Using a regular table

16. Create a simple text document in a local folder.  Go to Command Prompt and run a command like this:

 

echo The sea and sunshine are beautiful today > c:\temp\NewTextDocument.txt

 

17. Using the same database created earlier, let's create a regular table with FILESTREAM

 

IF OBJECT_ID('TextDocumentsRegular') IS NOT NULL
  DROP TABLE TextDocumentsRegular
GO
CREATE TABLE TextDocumentsRegular(
Guid UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
documentID INT NOT NULL ,
documentType VARCHAR(10) NOT NULL,
TextContent VARBINARY(MAX) FILESTREAM NOT NULL ,
dateinserted DATETIME)

 

18. Insert a row into the table using the newly-created document

 

INSERT INTO TextDocumentsRegular (documentID, documentType, TextContent, dateinserted)
SELECT 101, '.txt', *, GETDATE() FROM OPENROWSET(BULK N'C:\temp\NewTextDocument.txt', SINGLE_BLOB) rs

 

19. Check if the row got inserted, pointing to the document

 

SELECT * FROM TextDocumentsRegular

 

20. Get the primary key or unique key name from the table

 

exec sp_helpindex TextDocumentsRegular

 

21. Create a full-text index using the unique key discovered with above command

 

CREATE FULLTEXT INDEX ON [dbo].TextDocumentsRegular
(Textcontent type column documentType) 
KEY INDEX UQ__TextDocu__A2B5777D1D53476D -- name from prior step
ON ([FTCat1]) 
WITH (CHANGE_TRACKING AUTO)

 

22. Search using Full-Text CONTAINS() clause

 

SELECT * FROM TextDocumentsRegular
WHERE CONTAINS (TextContent, 'sunshine')

 

 

Namaste,

Joseph

%3CLINGO-SUB%20id%3D%22lingo-sub-334010%22%20slang%3D%22en-US%22%3EFilestream%20and%20Full-Text%20-%20Full%20Solution%20for%20Document%20Indexing%20in%20SQL%20Server%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-334010%22%20slang%3D%22en-US%22%3E%3CP%3E%3CSTRONG%3EFirst%20published%20on%20MSDN%20on%20Oct%2023%2C%202017%20%3C%2FSTRONG%3E%20%3CBR%20%2F%3ERecently%20someone%20was%20asking%20if%20there%20is%20a%20simple%20SQL%20Server%20solution%20where%20you%20would%20generate%20text%20documents%20and%20simply%20pass%20them%20to%20SQL%20Server%20to%20catalog%20without%20writing%20an%20application%20to%20do%20this.%20Also%2C%20would%20like%20to%20be%20able%20to%20index%20and%20search%20the%20context%20of%20these%20documents.%20%3CBR%20%2F%3E%3CBR%20%2F%3EThe%20solution%20in%20SQL%20Server%20would%20involve%20Filestream%20and%20Full-Text%20technologies.%20There%20are%20two%20alternatives%3A%20%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%0A%3COL%3E%0A%3COL%3E%0A%3CLI%3EUse%20FileTable%3C%2FLI%3E%0A%3C%2FOL%3E%0A%3C%2FOL%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3COL%3E%0A%3COL%3E%0A%3CLI%3EUse%20a%20regular%20table%3C%2FLI%3E%0A%3C%2FOL%3E%0A%3C%2FOL%3E%0A%3CP%3E%3CBR%20%2F%3E%3CBR%20%2F%3EBelow%20is%20a%20sample%20script%20of%20what%20each%20scenario%20would%20look%20like%3A%20%3CBR%20%2F%3E%2F***********************************************%20%3CBR%20%2F%3EUsing%20FileTable%20%3CBR%20%2F%3E************************************************%2F%20%3CBR%20%2F%3Euse%20master%20%3CBR%20%2F%3E%3CBR%20%2F%3Eexec%20sp_configure%20'filestream%20access%20level'%2C%202%20%3CBR%20%2F%3Ego%3C%2FP%3E%0A%3CP%3E--AT%20THIS%20POINT%2C%20YOU%20NEED%20TO%20RESTART%20SQL%20SERVER%3C%2FP%3E%0A%3CP%3E%3CFONT%20style%3D%22background-color%3A%20%23ffffff%3B%22%3E--See%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fsql%2Frelational-databases%2Fblob%2Fenable-and-configure-filestream%3Fview%3Dsql-server-2017%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fsql%2Frelational-databases%2Fblob%2Fenable-and-configure-filestream%3Fview%3Dsql-server-2017%3C%2FA%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%3E%3CBR%20%2F%3Edrop%20database%20FileTableDB_forTextDocs%20%3CBR%20%2F%3Ego%20%3CBR%20%2F%3ECREATE%20DATABASE%20FileTableDB_forTextDocs%20%3CBR%20%2F%3EON%20PRIMARY%20(%20%3CBR%20%2F%3ENAME%20%3D%20N'FileTableDB'%2C%20%3CBR%20%2F%3EFILENAME%20%3D%20N'C%3A%5CProgram%20Files%5CMicrosoft%20SQL%20Server%5CMSSQL13.MSSQLSERVER%5CMSSQL%5CDATA%5CFileTableFileTableDB.mdf'%20)%2C%20%3CBR%20%2F%3EFILEGROUP%20FilestreamFG%20CONTAINS%20FILESTREAM%20(%20%3CBR%20%2F%3ENAME%20%3D%20FileStreamGroup1%2C%20%3CBR%20%2F%3EFILENAME%3D%20'C%3A%5CProgram%20Files%5CMicrosoft%20SQL%20Server%5CMSSQL13.MSSQLSERVER%5CMSSQL%5CDATA%5CFileTableData'%20)%20%3CBR%20%2F%3ELOG%20ON%20(%20%3CBR%20%2F%3ENAME%20%3D%20N'FileTableDB_Log'%2C%20%3CBR%20%2F%3EFILENAME%20%3D%20N'C%3A%5CProgram%20Files%5CMicrosoft%20SQL%20Server%5CMSSQL13.MSSQLSERVER%5CMSSQL%5CDATA%5CFileTableFileTableDB_log.ldf'%20)%20%3CBR%20%2F%3EWITH%20FILESTREAM%20(%20%3CBR%20%2F%3ENON_TRANSACTED_ACCESS%20%3D%20FULL%2C%20%3CBR%20%2F%3EDIRECTORY_NAME%20%3D%20N'TextesFileTable')%20%3CBR%20%2F%3Ego%20%3CBR%20%2F%3E%3CBR%20%2F%3E--validate%20the%20db%20with%20directory%20name%20%3CBR%20%2F%3ESELECT%20DB_NAME%20(%20database_id%20)%2C%20directory_name%2C%20non_transacted_access%2C%20non_transacted_access_desc%20%3CBR%20%2F%3EFROM%20sys.database_filestream_options%20%3CBR%20%2F%3Ewhere%20directory_name%20is%20not%20null%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E%3CBR%20%2F%3Euse%20FileTableDB_forTextDocs%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3ECREATE%20TABLE%20dbo.TextDocuments%20AS%20FileTable%20%3CBR%20%2F%3EWITH%20(%20%3CBR%20%2F%3EFileTable_Directory%20%3D%20'TextesFileTable'%2C%20%3CBR%20%2F%3EFileTable_Collate_Filename%20%3D%20database_default)%3B%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E%3CBR%20%2F%3E--validate%20the%20table%20is%20there%20%3CBR%20%2F%3ESELECT%20*%20FROM%20sys.filetables%3B%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3ESELECT%20*%20FROM%20sys.tables%20WHERE%20is_filetable%20%3D%201%3B%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E--check%20if%20there%20is%20something%20in%20there%20-%20NOTHING%20YET%20%3CBR%20%2F%3Eselect%20*%20from%20dbo.TextDocuments%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E--find%20the%20location%20of%20the%20filetable%20root%20path%20(DIRECTORY_NAME)%20%3CBR%20%2F%3ESELECT%20FileTableRootPath('dbo.TextDocuments')%3B%20%3CBR%20%2F%3E%3CBR%20%2F%3E--Copy%20or%20create%202-3%20simple%20text%20files%20in%20the%20share%20discovered%20via%20the%20above%20command%20%3CBR%20%2F%3E--MAKE%20SURE%20YOU%20CLOSE%20THE%20SHARE%20BECAUSE%20KEEPING%20IT%20OPEN%20IN%20WINDOWS%20EXPLORER%20CAN%20CAUSE%20BLOCKING%20%3CBR%20%2F%3E--Now%20check%20if%20there%20is%20something%20in%20there%20-%20should%20see%202-3%20documents%20%3CBR%20%2F%3E%3CBR%20%2F%3Eselect%20*%20from%20dbo.TextDocuments%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E%3CBR%20%2F%3E--now%20create%20FT%20index%20on%20this%20table%20%3CBR%20%2F%3E--first%2C%20the%20catalog%20%3CBR%20%2F%3E%3CBR%20%2F%3ECREATE%20FULLTEXT%20CATALOG%20%5BFTCat1%5D%20WITH%20ACCENT_SENSITIVITY%20%3D%20ON%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E%3CBR%20%2F%3E--get%20the%20primary%2Funique%20key%20name%20%3CBR%20%2F%3Eexec%20sp_help%20Textdocuments%20%3CBR%20%2F%3E%3CBR%20%2F%3E--create%20the%20FT%20index%20%3CBR%20%2F%3ECREATE%20FULLTEXT%20INDEX%20ON%20%5Bdbo%5D.%5BTextDocuments%5D%20(file_stream%20TYPE%20COLUMN%20%5Bname%5D)%20KEY%20INDEX%20%5BPK__TextDocum__5A5B77D58E492DCE%5D%20ON%20(%5BFTCat1%5D)%20WITH%20(CHANGE_TRACKING%20AUTO)%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3E--Now%20search%20using%20Full-Text%20Contains()%20%3CBR%20%2F%3Eselect%20*%20from%20dbo.TextDocuments%20%3CBR%20%2F%3Ewhere%20contains%20(file_stream%2C%20'word_in_document')%20%3CBR%20%2F%3E%3CBR%20%2F%3E%3CBR%20%2F%3E2.%20Using%20a%20regular%20table%20%3CBR%20%2F%3E%2F*********************************************%20%3CBR%20%2F%3EUsing%20Filestream%20and%20regular%20table%20%3CBR%20%2F%3E**********************************************%2F%20%3CBR%20%2F%3E--Create%20a%20simple%20text%20document%20in%20a%20text%20editor%20and%20save%20it%20like%20this%20'C%3A%5Ctemp%5CNewTextDocument.txt'%20%3CBR%20%2F%3E--Now%20create%20a%20table%20%3CBR%20%2F%3Edrop%20table%20TextDocumentsRegular%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3Ecreate%20table%20TextDocumentsRegular(%20%3CBR%20%2F%3EGuid%20UNIQUEIDENTIFIER%20ROWGUIDCOL%20NOT%20NULL%20UNIQUE%20DEFAULT%20NEWSEQUENTIALID()%2C%20%3CBR%20%2F%3EdocumentID%20int%20not%20null%20%2C%20%3CBR%20%2F%3EdocumentType%20varchar(10)%20not%20null%2C%20%3CBR%20%2F%3ETextContent%20varbinary(max)%20FILESTREAM%20not%20null%20%2C%20%3CBR%20%2F%3Edateinserted%20datetime)%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3EINSERT%20INTO%20TextDocumentsRegular%20(documentID%2C%20documentType%2C%20TextContent%2C%20dateinserted)%20%3CBR%20%2F%3ESELECT%20101%2C%20'.txt'%2C%20*%2C%20GETDATE()%20FROM%20OPENROWSET(BULK%20N'C%3A%5Ctemp%5CNewTextDocument.txt'%2C%20SINGLE_BLOB)%20rs%20%3CBR%20%2F%3EGO%20%3CBR%20%2F%3Eselect%20*%20from%20TextDocumentsRegular%20%3CBR%20%2F%3E%3CBR%20%2F%3E--get%20the%20primary%2Funique%20key%20name%20%3CBR%20%2F%3Eexec%20sp_help%20TextDocumentsRegular%20%3CBR%20%2F%3E%3CBR%20%2F%3E--create%20full-text%20index%20using%20the%20unique%20key%20discovered%20with%20above%20command%20%3CBR%20%2F%3ECREATE%20FULLTEXT%20INDEX%20ON%20%5Bdbo%5D.TextDocumentsRegular%20%3CBR%20%2F%3E(Textcontent%20type%20column%20documentType)%20KEY%20INDEX%20UQ__TextDocum__A2B5777D37D0C9EB%20ON%20(%5BFTCat1%5D)%20%3CBR%20%2F%3EWITH%20(CHANGE_TRACKING%20AUTO)%20%3CBR%20%2F%3E%3CBR%20%2F%3E--search%20using%20full-text%20Contains()%20clause%20%3CBR%20%2F%3E%3CBR%20%2F%3Eselect%20*%20from%20dbo.TextDocumentsRegular%20%3CBR%20%2F%3Ewhere%20contains%20(TextContent%2C%20'word_in_document')%20%3CBR%20%2F%3E%3CBR%20%2F%3E%3CBR%20%2F%3ENamaste%2C%20%3CBR%20%2F%3E%3CBR%20%2F%3EJoseph%20%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-334010%22%20slang%3D%22en-US%22%3E%3CP%3EFirst%20published%20on%20MSDN%20on%20Oct%2023%2C%202017%20Recently%20someone%20was%20asking%20if%20there%20is%20a%20simple%20SQL%20Server%20solution%20where%20you%20would%20generate%20text%20documents%20and%20simply%20pass%20them%20to%20SQL%20Server%20to%20catalog%20without%20writing%20an%20application%20to%20do%20this.%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-334010%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3ETSQL%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Co-Authors
Version history
Last update:
‎Oct 19 2021 07:14 PM
Updated by: