Forum Discussion

Linwood's avatar
Linwood
Copper Contributor
Jun 25, 2019

Migrating from Server 2003 to Server 2019 using Storage Migration Service

I discovered the new Storage Migration tool in Windows 2019 server, and we have an old 2003 server we needed to migrate.  Both are member servers of the same domain.  

 

I have run it multiple times, twice from scratch, also deltas. 

 

It is a royal mess.  The servers are on the same LAN, connected by 1GB ethernet, both are virtualized.  It takes FOREVER to run, despite only having about 700GB and around 500,000 files.  That's OK for the initial copy, but the deltas need to go faster (last one took 3 days) as otherwise we can never cutover.  We need a way to get a final delta in an hour or at most a few.

 

But that is not the real problem.  The job says "Couldn't transfer some devices" and every share says "couldn't transfer some files".   The D$ share is the only one that shows activity (probably appropriate as it is only that drive and doing other share names explicitly would be duplicates), and shows 96,645 failed files.

 

There appears to be no way I can find to gain an understanding of WHY they are not copied.  Looking in the debug logs I see all sorts of errors like "An unexpected network error occurred" or "the specified network name is no longer available" or "the handle is invalid" or "access is denied". 

 

I've done many hours of transfers with Goodsync and with Robocopy as a comparison, and not one network error.  So if these are being caused, with Storage Migration, by network errors, I cannot see why.

 

I ASSUME it is using the backup API; we do indeed have some files that are protected against Domain Admin, but I am running it with an account that is both Domain Admin and explicitly Backup Operator (though I think that is assumed as an administrator).  Many if not most of the files I spot checked, that gave errors, ARE on the new server, and look correct.

 

But missing almost 100,000 files is a real issue.

 

I installed the latest version (1.25.0 the first time I think, later installed 1.42.0) which purports to have better error messages, and reran (seemed even slower) but saw no more useful messages.  When I look at files that gave errors, nothing jumps out at me.  I can copy them manually for example, so there's not a disk corruption issue.

 

I don't really know where to start to get a handle on the problem.  

 

I did set it up to use CRC64 for checksum validation, and did not set it up to backup the folders to be overwritten, and set 10080 minutes duration, 3 retries and 60 seconds between.

 

Most files I see in the debug log as failures are actually present (yet I did not see a debug entry for a succesful copy in the few I dug deeper for).  Some files not present are on the source protected against admin access, BUT it should be using the backup API, right?   But other files similarly protected are moved.

 

There seems no rhyme or reason to what works and does not, but 100,000 or so failures is too much to deal with individually.  I'm not quite sure where to go.

 

And robocopy, just tried a bit ago on a subset, works perfectly, not one error.  But I would rather use storage migration as it moves the shares themselves.  but I may have to revert to old school.

 

Any advice, what sorts of things could I do to try to debug the failures?   The debug log shows LOTS Of failures, but no real cause of them.  I can go copy the same file, manually, that shows a failure.  And again, running robosync (with 8 threads) does not show a single error (on a subset). 

 

Thanks in advance, 

 

Linwood

10 Replies

  • NedPyle's avatar
    NedPyle
    Former Employee
    Hi. Sorry to hear you’re having trouble. A few thoughts:

    1. Enabling checksums will make things much much slower (this is a natural side effect of CRC calculation over the wire, it’s just going to be slow).

    2 Your errors sound like. Can you confirm you have installed that fix? https://docs.microsoft.com/en-us/windows-server/storage/storage-migration-service/known-issues#certain-files-do-not-inventory-or-transfer-error-5-access-is-denied

    We’re working on some fixes to make rescans faster, those should be released in the next month or two. Without using checksums, first scans will be similar to robocopy speeds. I have an article on speeding up your transfers here too:

    https://docs.microsoft.com/en-us/windows-server/storage/storage-migration-service/faq#optimize

    Ned Pyle
    • Linwood's avatar
      Linwood
      Copper Contributor

      NedPyle  thanks.  We gave up and moved to robocopy.

       

      I had tried it again without CRC without significant impact on performance.

       

      We had the OS fully patched in late June, so I assume it had the april update, unless you are saying that update is not in the normal stream of windows' updates.

       

      For us, in our situation, for whatever reason it was simply not usable -- too many files disappeared, and there was no good way to account for them and ensure we could find and fix all the issues.  It was too risky to use the tool the way it works, notably the way it reports errors.  There should be a consolidated summary (an as-of summary after reruns as well) that shows open issues - files not sync'd and why.  You shouldn't have to spend hours trying to find them in event logs, nor should  you resort to (as I did) other tools to do complete folder directories and checksums and then do a differences on the old and new drive.

       

      The risk of data loss given its poor performance, combined with poor error reporting, was just too great.

       

      I hope there's a version 2 that would work - robocopy and checksum tools are not great tools.  But they work.  The simplicity of a log file that only shows errors (and very few of them, like locked files) is reassuring. 

       

      Linwood

      • NedPyle's avatar
        NedPyle
        Former Employee

        Linwood Thanks. In the future, please do not give up and move to robocopy - open a support case so we can investigate. If there's a bug, the case is free. Otherwise, nothing will improve (robocopy spent 20 years being improved through support cases :) ). We have had tens of thousands of migrations, moved 10PB of data, and no one has reported the exact symptoms you are reporting, we'd like to understand what happened here.

         

        That turning off CRC didn't help performance means there was something very wrong going on in this migration, the different will always be dramatic. Regarding the logs - did you look at the CSV logs that you can download after transfers, but find them to be inadequate? You shouldn't need to look at event logs, we have transfer logs for this reason - both complete and error-only. 

  • Linwood's avatar
    Linwood
    Copper Contributor

    I ran this again on a more limited set of 24,205 files, and it failed on about 60.  I grabbed one and looked in depth without finding anything wrong. I can copy the same file pulled from the destination system with just a "copy" statement, so it's not being blocked by anti-virus or disk corruption.


    I tracked it down in the event log and the error is simply: 

     

    (64) The specified network name is no longer available.

     

    Bear in mind that every test I have done with other tools, like robocopy or goodsync, have succeeded without error.  If there are network issues they are not showing up in other tools. 

     

    I am aware of this: https://support.microsoft.com/en-us/help/961293/unable-to-access-shares-the-specified-network-name-is-no-longer-availa

     

    But we are running 12.1.1101.401 RU1 MP1, so it's much later.  It is running only on the source, not the destination.  I cannot turn it off due to policy (this is a production site in 24x7 use).  And again, no other tools are hitting that error. 

     

    I am running it again to see if it fails on the same files, but it takes an astoundingly long time.  These folders took about an hour with robocopy (I did not time it precisely), but too Storage Migration about 10 hours.  And I ran SMS in a relatively idle time, and robocopy during prime business hours.


    And yes, I am running SMS on Windows 2019 server, fresh install, fully patched, with proxy server, with the SMS component updated to the latest version.

     

    Any ideas?   Is it really this slow?   Is it really this unreliable?

     

     

     

    • Linwood's avatar
      Linwood
      Copper Contributor

      I should have included the full error text: 

       

      "06/26/2019-02:17:41.225 [Erro] Transfer error for \\***pathRedacted***\PTWin32\Archive-032206\Data\Acctcode.px: (64) The specified network name is no longer available.
      Stack Trace:
      at Microsoft.StorageMigration.Proxy.Service.Transfer.FileDirUtils.GetTargetFile(String path)
      at Microsoft.StorageMigration.Proxy.Service.Transfer.FileDirUtils.GetTargetFile(FileInfo file)
      at Microsoft.StorageMigration.Proxy.Service.Transfer.FileTransfer.InitializeSourceFileInfo()
      at Microsoft.StorageMigration.Proxy.Service.Transfer.FileTransfer.Transfer()
      at Microsoft.StorageMigration.Proxy.Service.Transfer.FileTransfer.TryTransfer() [d:\os\src\base\dms\proxy\transfer\transferproxy\FileTransfer.cs::TryTransfer::55]"

Resources