Debugging UCSI firmware failures
Published Nov 06 2018 06:24 PM 19.7K Views
Microsoft

Background

The UCSI driver in Windows communicates with the firmware UCSI component (called PPM or Platform Policy Manager) through the spec-defined command notification interfaces. While driver failures can be tracked using Windows Error Reporting or WER and driver traces, the firmware may run into failures as well which might go undetected at first but result into loss of functionality later in time. UCSI compliant machines in the wild do run into such failures every day in the tens of thousands.
This document intends to describe a way in which an OEM or a firmware developer may avoid these PPM failures pro-actively running some stress tests.

  1. The most common PPM failure is UCSI command timeout.
  2. UcmUcsiCx.sys is the inbox component in Windows that communicates with PPM. Here is a typical message flow between the driver and the firmware.
  3. UcmUcsiCx.sys driver sends a UCSI command to PPM
    PPM asynchronously sends command complete notification (CCI:: Command Complete Indicator*).
  4. The driver receives the command complete notification and then send Ack to PPM (ACK_CC_CCI)
    PPM asynchronously send ack complete notification (CCI::Acknowledge Command Indicator).
    The driver waits for 10 seconds for an asynchronous notification (steps (2) and (4)) before giving up and generating failure report or a livedump.

*Note that the above steps are performed for all UCSI commands except PPM_RESET since after PPM_RESET the PPM Is expected to be in reset state where no notifications are enabled. The driver then polls for CCI::Reset Complete Indicator to be true.

 

Converting firmware failures to bugchecks

One way in which an OEM or a firmware developer can catch these failures in the lab or development environment is the configure the machine to bugcheck generating a kernel crashdump rather than  livedump which may go undetected. In addition to repro the failure, we advise an OEM/firmware developer to use Connection Exerciser Stress test: CxStress. CxStress is a part of MUTT software pachage found in this location.

  • Setup Connection exerciser: An example connection is shown in the following picture where the Type-C connector of the SUT is connected to the connection exerciser. The 4 Type-C ports of the connection exerciser is connected to 4 different kinds of the partners. For example, one or more alt-mode docks (if applicable), one or more PD chargers, a Type-C MUTT or other USB-C devices like a USB-C mass storage device. Refer to the picture below for an example setup.ucsi setup.png
  • Configure UcmUcsiCx.sys to elevate livedump to crashdump
    • Locate the device node in Device Manager (devmgmt.msc) named UCM-UCSI ACPI Device. The node is under the USB Connector Manager
    • Right-click on the device, and select Properties and open the Details
    • Select Device Instance Path from the drop-down and note the property value.
    • On an elevated command prompt
      • reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\<device-instance-path>\Device Parameters" /v ForceCommandFailureCrash /t REG_DWORD /d 0x01
    • Restart the device by selecting the Disable option on the device node in Device Manager, and then selecting Enable. Alternatively, you can simply restart the PC.
  • Run CxStess.cmd (typically found in “C:\Program Files (x86)\USBTest\x64”). For completeness, we recommend you to run this test overnight (~8 hours).
  • Keep an eye on bugchecks that might happen due to the above test. The next section talks about how to go about finding their root cause.

Analyzing Firmware Failures

This section goes through analysis of an example crash that occurred when UcmUcsiCx driver was configured to generate a crashdump instead of a livedump. Load the dump file in Windbg using <dir_path_to_windbg>\windbg.exe -z <path_to_memory_dump> from command line or run window. Refer analyzing kernel mode dumps for more details.

Run !analyze to check the failure type. Bugcheck code: 0x1D4 is the UCSI failure code.

 

1: kd> !analyze -v

UCMUCSI_FAILURE (1d4)
The UcmUcsi driver has encountered an error.
Arguments:
Arg1: 0000000000000000, A UCSI command has timed out because the firmware did not respond to the command in time.
Arg2: 0000000000000005, The UCSI command value.
Arg3: ffff988646b2f8d0, If non-zero, the pointer to additional information (dt UcmUcsiCx!UCMUCSICX_TRIAGE).
Arg4: 0000000000000000

Debugging Details:
..

 

 

Get UcmUcsiCx driver traces. The following text contains the last few lines of the trace which we are interested in. To dump the driver traces, we rely on a windbg command called 'rcdrlogdump' found in 'rcdrkd'. The last few entries in the traces are related to the driver trying to stop the state machine and generating a dump. Look for the state machine event called CommandTimeOut and retrace the state machine transitions back a few steps to find out which command has timed out. In this specific example, it happens to be SetNotificationEnable UCSI command. Refer line 1076, 1083 and 1085 in the following WPP trace dump from Windbg.

1: kd> !rcdrkd.rcdrlogdump ucmucsicx
Trace searchpath is: 

Trace format prefix is: %7!u!: %!FUNC! - 
Trying to extract TMF information from - C:\ProgramData\Dbg\sym\UcmUcsiCx.pdb\DA9957412F663B283F7EF8BE9406D0551\UcmUcsiCx.pdb
--- start of log ---
…
…
…
1073: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: External][Source: WaitingForCommand][Event : CommandAvailable][Target: ProcessNextCommand]
1074: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: Call][Source: ProcessNextCommand][Event : _noevent_][Target: RetreiveCommandType]
1075: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: External][Source: RetreiveCommandType][Event : AsyncCommand][Target: SendNewCommandToPPM]
1076: UcmUcsiCx::CommandHandler::SendUCSICommandToPpm - [UCMUCSIPPM: 0x0000507062114698] Sending Command UcsiCommandSetNotificationEnable to client


1077: UcmUcsiCx::CommandHandler::EvtLogEventEnqueue - [COMMAND HANDLER] SmFx event: RequestCompletedSuccessfully
1078: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: Call][Source: SendNewCommandToPPM][Event : _noevent_][Target: WaitingForRequestCompletion]
1079: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: ExplicitPop][Source: WaitingForRequestCompletion][Event : RequestCompletedSuccessfully][Target: SendNewCommandToPPM]
1080: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: External][Source: SendNewCommandToPPM][Event : Succeeded][Target: StartingCommandCompleteTimer]
1081: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: External][Source: StartingCommandCompleteTimer][Event : Done][Target: WaitingForCommandComplete]
1082: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: Call][Source: EnableCmdCompleteNotificationOnPowerUp][Event : _noevent_][Target: WaitingForCommandCompletion]
1083: UcmUcsiCx::CommandHandler::PrettyPrintCCI - [CCI] ConnChange:0x0 | DataLength:0x0 | NotSupported:0 | CancelCompleted:0 | ResetCompleted:1 | Busy:0 | AckCommand:0 | Error:0 | CommandComplete:0 |
1084: UcmUcsiCx::CxDevice::EvtDeviceWdmPreprocessSetPowerIrp - [WDFDEVICE: 0x000050706D0133C8] System Power Down to SYSTEM_POWER_STATE:0x5
1085: UcmUcsiCx::CommandHandler::EvtLogEventEnqueue - [COMMAND HANDLER] SmFx event: CommandTimedOut
1086: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: ExplicitPop][Source: WaitingForCommandComplete][Event : CommandTimedOut][Target: ProcessNextCommand]
1087: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: ImplicitPop][Source: ProcessNextCommand][Event : UnrecoverableFailure][Target: Started]
1088: UcmUcsiCx::CommandHandler::EvtLogTransition - [COMMAND HANDLER] SmFx transition [Transition: External][Source: Started][Event : UnrecoverableFailure][Target: Failed]
1089: UcmUcsiCx::Opm::EvtCommandCompletionReceived - [UCMUCSIPPM: 0x0000507062114698] Response timed out for command UcsiCommandSetNotificationEnable, 0x00000102(STATUS_TIMEOUT)
1090: UcmUcsiCx::Opm::EvtLogEventEnqueue - [OPM] SmFx event: Stop
1091: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: External][Source: WaitingForCommandCompletion][Event : Stop][Target: WaitingForCommandCompletionAfterStop]
1092: UcmUcsiCx::Opm::EvtLogEventEnqueue - [OPM] SmFx event: CommandComplete
1093: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: ExplicitPop][Source: WaitingForCommandCompletionAfterStop][Event : CommandComplete][Target: EnableCmdCompleteNotificationOnPowerUp]
1094: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: ImplicitPop][Source: EnableCmdCompleteNotificationOnPowerUp][Event : Stop][Target: PowerDownAndWaitForPowerUp]
1095: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: ImplicitPop][Source: PowerDownAndWaitForPowerUp][Event : Stop][Target: PpmOperational]
1096: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: ExplicitPop][Source: PpmOperational][Event : Stop][Target: Started]
1097: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: External][Source: Started][Event : Stop][Target: PurgingUnserviceCommandsDueToStop]
1098: UcmUcsiCx::Opm::EvtLogTransition - [OPM] SmFx transition [Transition: External][Source: PurgingUnserviceCommandsDueToStop][Event : Done][Target: StopCommandHandlerMachine]
1099: UcmUcsiCx::CommandHandler::EvtLogEventEnqueue - [COMMAND HANDLER] SmFx event: Stop
1100: UcmUcsiCx::CxDevice::UnrecoverableFailureEncountered - [WDFDEVICE: 0x000050706D0133C8] An unrecoverable failure has been encountered. The device will now restart.
---- end of log ----

To summarize the above failure, UcmUcsiCx driver sends SetNoficationEnable command to the firmware and the firmware fails to send command complete notification within the timeout period of 10 seconds. Instead it sends a Reset Complete notification.

When such a failure occurs, we advise an OEM with work with firmware developer to fix the problem.

 

2 Comments
Version history
Last update:
‎Nov 07 2018 01:59 PM
Updated by: