Azure Sphere 20.07 Security Enhancements

JewellSeay · ‎Jul 31 2020

The release of 20.07 brought along a range of security enhancements and changes to Azure Sphere. As head of the Operating System Platform (OSP) Security team, I want to provide more insights into the efforts to keep Azure Sphere secure as a platform while being as transparent as possible about all the improvements made since our 20.04 release.

The OSP Security team I run for Azure Sphere worked with a diverse group of people and companies to have three separate red team events happen on the platform over the last few months; an internal Microsoft team, Trail of Bits, and the currently active Azure Sphere Security Research Challenge (ASSRC). These efforts are on top of the continued work done by the OSP team over the last three months to harden and further the security of the platform.

Trail of Bits performed a private red team exercise on the system and identified a number of risks that have been fixed for 20.07:

The /proc file system on Linux is mounted rw giving the ability to write to /proc/self/mem allowing unsigned code execution, /proc/self/mem is now read-only
The internal GetRandom() function failed to properly fill in buffers that have a length that is not a multiple of 4, all current usage is a multiple of 4 so no risk and code was changed to avoid future impacts
The Sysctl Linux kernel configuration flags can be hardened, following values enabled
- kernel.kptr_restrict = 1 - limit Linux kernel pointer leakage
- kernel.dmesg_restrict = 1 - prevent access to dmesg to unprivleged processes
- fs.protected_hardlinks = 1 - users cannot create hardlinks unless they own the source file
- fs.protected_symlinks = 1 - symlinks are only followed when not in a world-writable directory, the owner of the symlink and follower match, or the directory owner and fsymlink owner match
- fs.protected_fifos = 2 - limit FIFO creation options when dealing with world writable directories
- fs.protected_regular = 2 - limit regular file creation options when dealing with world writable directories
The internal security library for setting process information used to return success even if failed to set a process' capabilities
DNS name expansion leaked stack memory
Null pointer dereferences in DMA memory for mtk3620 in the Linux kernel
Better filtering of the content-type for GatewayD to limit cross-site scripting abuse

Although the ASSRC is still on-going it has provided a range of great findings by the participants, some of which overlapped ToB's findings like a writable /proc/self/mem. The Linux kernel related issues identified by ToB were not fixed in 20.05 or 20.06 due to the massive Linux kernel uprgade from 4.9 to 5.4, this oversight will be handled better in the future.

Cisco Talos reported the first 2 findings that are fixed in 20.07, ptrace used to bypass the unsigned code execution protections and the Linux kernel message ring buffer being user accessible allowing for information leakage. Along with reporting the first two findings, Cisco Talos also reported the /proc/self/mem finding and found a double free in the azspio Linux kernel driver that have been fixed. Cisco has a blog post up detailing their efforts so far for the ASSRC.

As an excellent example of findings from the ASSRC effort, I would like to describe a specific attack chain that McAfee Advanced Threat Research found for the device that has been fixed for 20.07. This attack chain did require physical access to a device and could not be done remotely due to the steps involved.

We have multiple environments that devices can be part of, two of them are pre-production and production. McAfee ATR was able to claim a device to both preproduction and production across separate tenants. Due to an oversight in signature handling for device capability images on the cloud, an attacker that claimed a device they did not own to pre-production was allowed to request a capability image for the device that was production signed. This allowed obtaining a capability image for a device and gaining access to a locked down device, this was corrected immediately.
With the ability to get a capability image for a device, McAfee ATR could unlock a locked down device and also obtain the development capability allowing them to upload their own package to the device. An application package is a signed ASXIPFS image, our file system that is a variation of CramFS with the ability to execute from flash. The original file system code allowed for special inode filesystem entries which McAfee ATR used to create a special inode pointing to the MTD flash giving them read-only access to the on-device flash. 20.07 removes the ability to create any special inodes in the ASXIPFS image. Cisco also found and reported this vulnerability.
Although the user controlled special inode pointing to flash is read-only due to how the file system image is mounted, McAfee ATR found a 0-day in the Linux kernel for the MTD_WRITE ioctl. The ioctl function failed to check permissions before executing the ioctl call allowing flipping bits in flash from 1 to 0 allowing McAfee ATR to use this vulnerability to rewrite the uid_map file for the device. This is patched on our Linux kernel for 20.07 and publicly fixed with https://lkml.org/lkml/2020/7/16/430. Cisco also found and reported this vulnerability.
With the ability to modify flash, McAfee ATR rewrote the uid_map file that maps user IDs to applications to have an application with a maximum user ID where all bits in the user ID are set. The maximum ID resulted in the setresuid and setresgid function calls being passed -1. A -1 value to these functions is a special flag to indicate that the user id and group id should not be modified resulting in an application being ran as the sys user. The uid map parsing code no longer allows for a maxium user ID where all bits are set.
Under normal operation, a system package can not be uninstalled, however an application running as sys is allowed to modify and change symlinks in various directories resulting in the ability to abuse symlink confusion on packages and force the azcore package to be uninstalled when the uninstaller thinks it is uninstalling another application. From there a user package can be installed in it's place. Symlinks are no longer allowed in user application packages as of 20.07.
The user application that replaces azcore does have the proper uid and gid set when executed however because the Linux kernel executes it the capability bits were never cleared out. The user replaced azcore is then able to call setuid(0) and become root on the device. This will be fixed in 20.08.

McAfee ATR did a fantastic job putting together this attack chain and finding a 0-day in the core Linux kernel itself to make it work. The attack chain exposed a weakness in the cloud and multiple weakenesses on the device including a previously unknown Linux kernel vulnerability.

While the above changes were done as a result of external red team findings, the Operating System Platform team continued improving the security of Azure Sphere.

One effort we've been working on is minimizing the ability to use ptrace unless in development mode. PTrace is needed by gdb to properly provide debug information however normal customer applications do not have a need for it. Having ptrace be available to customer applications allows an attacker to ptrace the process being attacked and inject unsigned code into memory for execution. 20.07 brings along a Linux kernel change where ptrace is no longer possible unless in development which also brings along a few extra enhancements as a side effect, the largest being that /proc no longer shows any other process pid and is further restricted of what a process can know about itself.

Another security enhancement is moving to wolfSSL 4.4.0 bringing along additional side channel attack hardening. Along with the wolfSSL upgrade is work to begin exposing access to supported wolfSSL functionality, the first set of functions allowing customers to directly call wolfSSL for establishing TLS client connections.

We have added more fuzzing across 5 different components and additional static code analysis tools including extra static analysis tools on every pull request into our repositories. If the static analysis fails then the PR can not be completed, this further strengthens the system by making it more difficult to check in easy to abuse coding flaws. As we expand to add features and functionality more fuzzers are built for parts of the system being updated. The addition of the new static analysis tool detected an off by one calculation in DHCP message handling that allowed reading an extra byte of data past the end of the buffer, this was corrected in 20.07.

You may have noticed that our last couple quality releases did not have a Linux kernel patch bump, this time was used to allow the Linux kernel team to upgrade the Linux kernel from 4.9 to 5.4.44. By doing so we capture Linux kernel security enhancements done between the versions along with keeping up-to-date on the latest changes.

String manipulation functions are a very common way for leaking the stack cookie along with being able to write it when string buffers are not properly null terminated. GLibC helps limit string buffer attacks by forcing the first byte of the stack cookie in memory to be 0 however we use musl on the device for libc. Musl initializes all bytes in the stack cookie instead of leaving the first byte in memory 0 allowing for the potential of stack cookie leaks and abuses. Our version of musl in 20.07 sets the first byte to 0 and the patch was provided to the maintainer incase they wish to add this security measure to musl.

On top of our own changes, MediaTek provided a new version of the firmware for their WiFi subsystem of their MT3620 that is now being used on the platform to deal with a range of issues.

As you can see, a wide range of security improvements have been made to the platform as we continue to strive to be the best in the field. We will continue to be transparent about our efforts and are devoted to being the most secure platform for IoT.

Jewell Seay
Azure Sphere OSP Security Team Lead

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Azure Sphere 20.07 Security Enhancements