HPI格式说明

xpoy0 · 2010 年4 月 1 日 01:40

http://visualta.tauniverse.com/Downloads/ta-hpi-fmt.txt
Now some of you are undoubtedly looking at an HPI file with a hex dump
program, and saying “That sure doesn’t look like a directory to me!”
Well, you’re right. That’s because it’s encrypted.

To decrypt it, first calculate the decryption key from the HeaderKey
variable:

Key = NOT ((HeaderKey * 4) OR (HeaderKey >> 6))

Doing this on the 0x0000007D, you get FFFFFE0A (I think).

Here is the C code for the decryption routine. Since everything in the
file is encrypted, I found it easier to combine the read and decryption
functions into one.

int ReadAndDecrypt(int fpos, char buff, int buffsize)
/
Read “buffsize” bytes from the HPI file at position “fpos”
into “buff”, and then decrypt it.
*/
{
int count;
int tkey;
int result;

/* first, position the file */
fseek(HPIFile, fpos, SEEK_SET);

/* read the data into buff */
result = fread(buff, 1, buffsize, HPIFile);

/* for each character in buff… */
for (count = 0; count < buffsize; count++) {

/* compute tkey = (fpos + count) XOR Key */
tkey = (fpos + count) ^ Key;

/* and then decode the character:  
   buff[count] = tkey XOR (NOT buff[count]) */

buff[count] = tkey ^ ~buff[count];
}

/* result is the number of bytes actually read in,
and should be equal to buffsize */
return result;
}

Note that the position of the byte in the file (fpos+count) is used to
decrypt.

And here is a decoded directory to make it easy to follow. Note that
I loaded the actual directory starting at offset 0x14, so that the
first 0x14 bytes are all zeros. See the implementation
note above.

All numbers here are 32-bit integers, ie “longs”.

00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
00000010 00 00 00 00 08 00 00 00 1C 00 00 00 64 00 00 00 …d…
00000020 6A 00 00 00 01 97 00 00 00 A0 00 00 00 01 C6 00 j…
00000030 00 00 CF 00 00 00 01 13 01 00 00 1D 01 00 00 01 …
00000040 66 01 00 00 6E 01 00 00 01 94 01 00 00 9D 01 00 f…n…
00000050 00 01 C3 01 00 00 C9 01 00 00 01 EF 01 00 00 F7 …
00000060 01 00 00 01 61 6E 69 6D 73 00 01 00 00 00 72 00 …anims…r.
00000070 00 00 7B 00 00 00 8E 00 00 00 00 61 72 6D 66 6C …{…armfl
00000080 61 6B 5F 67 61 64 67 65 74 2E 67 61 66 00 24 02 ak_gadget.gaf.$.
00000090 00 00 D8 2D 00 00 01 64 6F 77 6E 6C 6F 61 64 00 …-…download.
000000A0 01 00 00 00 A8 00 00 00 B1 00 00 00 BD 00 00 00 …
000000B0 00 41 52 4D 46 4C 41 4B 2E 54 44 46 00 61 28 00 .ARMFLAK.TDF.a(.
000000C0 00 01 01 00 00 01 66 65 61 74 75 72 65 73 00 01 …features…
000000D0 00 00 00 D7 00 00 00 E0 00 00 00 E8 00 00 00 01 …
000000E0 63 6F 72 70 73 65 73 00 01 00 00 00 F0 00 00 00 corpses…
000000F0 F9 00 00 00 0A 01 00 00 00 61 72 6D 66 6C 61 6B …armflak
00000100 5F 64 65 61 64 2E 74 64 66 00 E3 28 00 00 68 02 dead.tdf…(…h.
00000110 00 00 01 6F 62 6A 65 63 74 73 33 64 00 02 00 00 …objects3d…
00000120 00 25 01 00 00 37 01 00 00 43 01 00 00 00 4C 01 .%…7…C…L.
00000130 00 00 5D 01 00 00 00 61 72 6D 66 6C 61 6B 2E 33 …]…armflak.3
00000140 64 6F 00 39 2A 00 00 7B 14 00 00 01 61 72 6D 66 do.9*…{…armf
00000150 6C 61 6B 5F 64 65 61 64 2E 33 64 6F 00 C9 34 00 lak_dead.3do…4.
00000160 00 1A 11 00 00 01 73 63 72 69 70 74 73 00 01 00 …scripts…
00000170 00 00 76 01 00 00 7F 01 00 00 8B 01 00 00 00 41 …v…A
00000180 52 4D 46 4C 41 4B 2E 43 4F 42 00 67 3F 00 00 E4 RMFLAK.COB.g?..
00000190 09 00 00 01 75 6E 69 74 70 69 63 73 00 01 00 00 …unitpics…
000001A0 00 A5 01 00 00 AE 01 00 00 BA 01 00 00 00 41 52 …AR
000001B0 4D 46 4C 41 4B 2E 50 43 58 00 B4 42 00 00 91 25 MFLAK.PCX…B…%
000001C0 00 00 01 75 6E 69 74 73 00 01 00 00 00 D1 01 00 …units…
000001D0 00 DA 01 00 00 E6 01 00 00 00 41 52 4D 46 4C 41 …ARMFLA
000001E0 4B 2E 46 42 49 00 89 63 00 00 39 05 00 00 01 77 K.FBI…c…9…w
000001F0 65 61 70 6F 6E 73 00 01 00 00 00 FF 01 00 00 08 eapons…
00000200 02 00 00 1B 02 00 00 00 61 72 6D 66 6C 61 6B 5F …armflak
00000210 77 65 61 70 6F 6E 2E 74 64 66 00 2D 67 00 00 42 weapon.tdf.-g…B
00000220 02 00 00 01

Let’s get started…

00000010 00 00 00 00 08 00 00 00 1C 00 00 00 64 00 00 00 …d…
^^^^^^^^^^^ ^^^^^^^^^^^

At offset 0x14, you see the number 0x8. This is the number of
entries in the directory. Grabbing the next 32-bit number at
offset 0x18, you get 0x1C. This is the offset of a list of directory entries.
In this case, there are 8 entries in the list. The format of an entry is:

typedef struct _HPIEntry {
long NameOffset; /* points to the file name /
long DirDataOffset; / points to directory data /
char Flag; / file flag */
} HPIEntry;

NameOffset

Pointer to the file name. This is a 0-terminated string of varying length.

DirDataOffset

Pointer to the directory data for the file. The actual data varies depending
on whether it’s a file or a subdirectory.

Flag
If this is 1, the entry is a subdirectory. If it’s 0, it’s a file.

Looking at offset 0x1C, we see:

00000010 64 00 00 00 …d…
00000020 6A 00 00 00 01 97 00 00 00 A0 00 00 00 01 C6 00 j…
00000030 00 00 CF 00 00 00 01 13 01 00 00 1D 01 00 00 01 …
00000040 66 01 00 00 6E 01 00 00 01 94 01 00 00 9D 01 00 f…n…
00000050 00 01 C3 01 00 00 C9 01 00 00 01 EF 01 00 00 F7 …
00000060 01 00 00 01

The 8 entries are:
0x064, 0x06A, 1
0x097, 0x0A0, 1
0x0C6, 0x0CF, 1
0x113, 0x11D, 1
0x166, 0x16E, 1
0x194, 0x19D, 1
0x1C3, 0x1C9, 1
0x1EF, 0x1F7, 1

Let’s look at the first entry. The Flag is 1, so it’s a
subdirectory. At offset 0x64, we see:

00000060 01 00 00 01 61 6E 69 6D 73 00 01 00 00 00 72 00 …anims…r.
^^^^^^^^^^^^^^^^^
or ‘anims’. This is the name. Since this is a subdirectory,
offset 0x6A contains the number of entries in the subdirectory,
followed by a pointer to the first entry. This is exactly like
the count/pointer pair at 0x14 that got us started. Think recursion.

00000060 01 00 00 01 61 6E 69 6D 73 00 01 00 00 00 72 00 …anims…r.
^^^^^^^^^^^ ^^^^^
00000070 00 00 7B 00 00 00 8E 00 00 00 00 61 72 6D 66 6C …{…armfl
^^^^^

The number at offset 0x6A is a 1, indicating that there’s only 1
file in this subdirectory. 0x6E contains the offset of the first
(and only) entry in the subdirectory, which is:

0x7B, 0x8E, 0

The 0 indicates that this is a file. Looking at offset 0x7B, we see:

00000070 00 00 7B 00 00 00 8E 00 00 00 00 61 72 6D 66 6C …{…armfl
^^^^^^^^^^^^^^
00000080 61 6B 5F 67 61 64 67 65 74 2E 67 61 66 00 24 02 ak_gadget.gaf.$.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

or ‘armflak_gadget.gaf’. This is the name of the first (and only) file
in the ‘anims’ subdirectory. Since this is a file, the data at offset
0x8E is a little different.

There are 3 items here instead if one:

typedef struct _HPIFileData {
long DataOffset; /* starting offset of the file /
long FileSize; / size of the decompressed file /
char Flag; / file flag */
} HPIEntry;

DataOffset

This is the offset in the HPI file that this file starts at.

FileSize

This is the decompressed file size. When you extract the file, it
should be this many bytes long.

Flag

If this is 1, the file is compressed with LZ77 compression.
If it’s 2, it’s compressed with ZLIb compression.
If it’s 0, it’s not compressed at all. This is the format used by the
unit viewer.

00000080 61 6B 5F 67 61 64 67 65 74 2E 67 61 66 00 24 02 ak_gadget.gaf.$.
^^^^^
00000090 00 00 D8 2D 00 00 01 64 6F 77 6E 6C 6F 61 64 00 …-…download.
^^^^^ ^^^^^^^^^^^ ^^

Looking at offset 0x8E, we see that the three items are:

0x224, 0x2DD8, 1

If you recall, the directory size was 0x224 bytes. This says the file
starts at the first offset after the directory, which makes sense and means
we’re interpreting things correctly. This also says that the extracted
file should be 0x2DD8 (or 11,736) bytes long.

At this point, we know enough to actually traverse the directory tree in the
HPI file. Here’s a recursive pseudocode function to do it. The initial call
to it would be ‘TraverseTree(“.”, Header.Start)’.

TraverseTree(string ParentName, int offset)

Entries = Directory[offset]
EntryOffset = Directory[offset+4]

for count = 1 to Entries
NameOffset = Directory[EntryOffset]
DataOffset = Directory[EntryOffset+4]
Flag = Directory[EntryOffset+8]

Name = ParentName+""+Directory[NameOffset]

print "Processing ",Name

if Flag = 1
TraverseTree(Name, DataOffset) ← recursion!
else
ProcessFile(Name, DataOffset)
End If

EntryOffset = EntryOffset + 9
Next Count

If you code this up in your language of choice and run it, it
should print something like this: (if you haven’t guessed already,
the file I’m using as an example is the “Arm Flakker” unit’s
aflakker.ufo file)

.\anims
.\anims\armflak_gadget.gaf
.\download
.\download\ARMFLAK.TDF
.\features
.\features\corpses
.\features\corpses\armflak_dead.tdf
.\objects3d
.\objects3d\armflak.3do
.\objects3d\armflak_dead.3do
.\scripts
.\scripts\ARMFLAK.COB
.\unitpics
.\unitpics\ARMFLAK.PCX
.\units
.\units\ARMFLAK.FBI
.\weapons
.\weapons\armflak_weapon.tdf

At this point, I urge you to go look at that directory hex dump and
traverse the thing by hand until it makes sense.

I can hear you now. “What the heck is that ‘ProcessFile’ function?”
It decodes the file. I’ll explain in a bit.

But first, here’s a list of the various files in this HPI file, and
their starting offsets.

If you don’t understand where I got the starting offsets,
go reread the directory hex dump until you do.

anims\armflak_gadget.gaf 0x0224
download\ARMFLAK.TDF 0x2861
features\corpses\armflak_dead.tdf 0x28E3
objects3d\armflak.3do 0x2A39
objects3d\armflak_dead.3do 0x34C9
scripts\ARMFLAK.COB 0x3F67
unitpics\ARMFLAK.PCX 0x42B4
units\ARMFLAK.FBI 0x6389
weapons\armflak_weapon.tdf 0x672D

Because it’s a short file, and because it decodes to readable
text, I’m going to use the ARMFLAK.TDF file as the example.

If the file was not compressed at all, then the file is just inserted
into the HPI file as one big chunk.

But if it is…

This is where the -REAL- fun begins. I’m going to take it slow
here because I’m still half figuring it out myself (writing this
has actually made me realize some things that I hadn’t before).

When the file was compressed, it was broken up into chunks of 64K (65536)
bytes each, plus one more chunk to hold anything left over. Each chunk
was then compressed. Note that some chunks are larger when compressed
than decompressed, which means that some compressed chunks can be larger
than 64K.

The total number of chunks in the file can be obtained by
the following formula:

chunks = Entry.FileSize / 65536
if (Entry.FileSize mod 65536) <> 0
chunks = chunks + 1

The offset in the directory points to a list of 32-bit numbers
that are the compressed sizes of each compressed chunk of data.

Following the list of sizes are the actual compressed chunks
of data.

In this HPI file, each file has only one chunk, but the totala1.hpi file
contains some files with a dozen or so, and the hpi files on the CDs have
files in them with over a hundred.

Going to offset 0x2861, we read in a chunk o’data and decrypt it
to find this:

00002860 7E 00 00 00 53 51 53 48 02 01 01 6B 00 00 00 .~…SQSH…k…
00002870 01 01 00 00 FE 36 00 00 20 5B 51 49 4E 55 3C 0E …6… [QINU<.
00002880 64 64 94 5D 49 5D 11 14 29 7B D5 26 18 55 6E 75 dd.]I]…){.&.Unu
00002890 64 54 34 41 79 6C 9C 71 81 83 8B 3B 59 49 CB 4D dT4Ayl.q…;YI.M
000028A0 43 D1 42 54 9A A5 A8 AA AF B0 B8 64 61 AC 6D B0 C.BT…da.m.
000028B0 B1 82 72 34 79 B8 B0 BD DD A3 82 81 E8 86 AC 89 …r4y…
000028C0 98 92 C2 CF 98 EB 9D E0 56 BF A2 6F AB 5F A8 96 …V…o.…
000028D0 B5 C3 9F B8 EB B9 BE 7D 4F C7 5F CE 2F D1 4C D1 …}O../.L.
000028E0 D0 D2 90

The decompressed file size of ARMFLAK.TDF is 257 bytes. This
tells us that there’s only one chunk. The size of this chunk
is 0x7E bytes. The chunk itself immediately follows.

Each chunk looks like this:

typedef struct _HPIChunk {
long Marker; /* always 0x48535153 (SQSH) /
char Unknown1;
char CompMethod; / 1=LZ77, 2=ZLib /
char Encrypt; / is the block encrypted? /
long CompressedSize; / the length of the compressed data /
long DecompressedSize; / the length of the decompressed data /
long Checksum; / Checksum /
char data[]; / ‘CompressedSize’ bytes of data */
} HPIChunk;

Marker

This is the start-of-chunk marker, and is always 0x48535153 (ASCII ‘SQSH’).

Unknown1
I know not what this is for. It’s always 0x02.
Maybe some sort of version number?

CompMethod
This is the compression method. It’s 1 for LZ77, 2 for ZLib.

Encrypt
This tells whether the block is encrypted a second time. See below.

CompressedSize

This is the size of the compressed data in the chunk. 0x6B bytes.

DecompressedSize

This is the size of the decompressed data in the chunk. 0x101 bytes.

Checksum

This is a checksum of the data. It’s merely the sum of all the bytes of
data (treated as unsigned numbers) added together.

data

The actual compressed data in the chunk. CompressedSize (0x6B) bytes of it.

Let’s look at the data.

00002870 20 5B 51 49 4E 55 3C 0E …6… [QINU<.
00002880 64 64 94 5D 49 5D 11 14 29 7B D5 26 18 55 6E 75 dd.]I]…){.&.Unu
00002890 64 54 34 41 79 6C 9C 71 81 83 8B 3B 59 49 CB 4D dT4Ayl.q…;YI.M
000028A0 43 D1 42 54 9A A5 A8 AA AF B0 B8 64 61 AC 6D B0 C.BT…da.m.
000028B0 B1 82 72 34 79 B8 B0 BD DD A3 82 81 E8 86 AC 89 …r4y…
000028C0 98 92 C2 CF 98 EB 9D E0 56 BF A2 6F AB 5F A8 96 …V…o.…
000028D0 B5 C3 9F B8 EB B9 BE 7D 4F C7 5F CE 2F D1 4C D1 …}O../.L.
000028E0 D0 D2 90

Doesn’t look like much, does it. That’s because (YOU GUESSED IT!) it’s
encrypted yet again! Note: the checksum is calculated BEFORE this
decryption.

The ‘Encrypt’ field in the HPIChunk header is set to 1 to indicate
that this decryption needs to be done.

To decrypt, do this (more pseudocode):

for x = 0 to CompressedSize-1
data[x] = (data[x] - x) XOR x
next x

This gives us:

00002870 20 5B 4D 45 4E 55 30 00 [MENU0.
00002880 54 52 80 59 31 5D 0D 0A 09 7B D1 00 10 55 4E 49 TR.Y1]…{…UNI
00002890 54 22 00 3D 41 52 60 4D 41 43 4B 3B 11 01 83 01 T".=AR`MACK;…
000028A0 33 81 32 02 42 55 54 54 4F 4E B4 02 19 42 01 4E 3.2.BUTTON…B.N
000028B0 41 70 02 C2 01 46 4C 41 DD 23 02 7D E0 04 20 05 Ap…FLA.#.}… .
000028C0 18 00 32 CF 00 D3 01 DE 56 3F 02 4F 03 5F 04 68 …2…V?.O._.h
000028D0 05 33 1F 06 D3 01 3E 41 8F 07 9F 08 AF 09 80 0D .3…>A…
000028E0 00 00 4C …L

Woohoo! Look! Readable word fragments! But remember, the chunk is still
compressed.

In this case, the block is compressed with LZ77, since CompMethod is 1.

The compression algorithm is a very basic sliding window compression scheme
from the LZ77 family using a 4095 byte history and matches from 2 to 17
bytes long.

The first byte is kind of a “tag” byte which determines if the next eight
pieces of data are literal bytes or history matches. Starting with the
least-significant bit, this tag byte is scanned to figure out what to do.

When the current bit is a zero, the next byte of the input is transfered
directly to the output and added to end of the history buffer.

When the current bit is a one, the next two bytes taken from the input are
used as a offset/length pair. The upper 12 bits are the offset into the
history buffer and the lower 4 bits are the length. If the offset is
zero, the end of the input data has been reached and the decompressor
simply exits.

Since we can assume that there will be no matches with a length of zero
or only one byte, any match is a mimimum of two bytes so we just add two
to the length which extends our range from 0-15 to 2-17 bytes.

The match is then copied from the history buffer to the output and also added
onto the end of the history buffer to keep it in sync with the output.

When all eight bits of the tag byte have been used, the mask is reset and
the next tag byte is loaded.

Here is some decompress code:

int Decompress(char *out, char in, int len)
{
/
Decompress buffer “in” of size “len” into buffer “out” (previously
allocated) returns the number of decompressed bytes.
*/

int x;
int outbufptr;
int mask;
int tag;
int inptr;
int outptr;
int count;
int done;
char Window[4096];
int inbufptr;

for (x = 0; x < len; x++) {
in[x] = (in[x] - x) ^ x;
}

done = FALSE;

inptr = 0;
outptr = 0;
outbufptr = 1;
mask = 1;
tag = in[inptr++];

while (!done) {
if ((mask & tag) == 0) {
out[outptr++] = in[inptr];
Window[outbufptr] = in[inptr];
outbufptr = (outbufptr + 1) & 0xFFF;
inptr++;
}
else {
count = *((unsigned short *) (in+inptr));
inptr += 2;
inbufptr = count >> 4;
if (inbufptr == 0)
return outptr;
else {
count = (count & 0x0f) + 2;
if (count >= 0) {
for (x = 0; x < count; x++) {
out[outptr++] = Window[inbufptr];
Window[outbufptr] = Window[inbufptr];
inbufptr = (inbufptr + 1) & 0xFFF;
outbufptr = (outbufptr + 1) & 0xFFF;
}
}
}
}
mask *= 2;
if (mask & 0x0100) {
mask = 1;
tag = in[inptr++];
}
}
return outptr;
}

When fed the data, the routine spits out:

00000000 5B 4D 45 4E 55 45 4E 54 52 59 31 5D 0D 0A 09 7B [MENUENTRY1]…{
00000010 0D 0A 09 55 4E 49 54 4D 45 4E 55 3D 41 52 4D 41 …UNITMENU=ARMA
00000020 43 4B 3B 0D 0A 09 4D 45 4E 55 3D 33 3B 0D 0A 09 CK;…MENU=3;…
00000030 42 55 54 54 4F 4E 3D 33 3B 0D 0A 09 55 4E 49 54 BUTTON=3;…UNIT
00000040 4E 41 4D 45 3D 41 52 4D 46 4C 41 4B 3B 0D 0A 09 NAME=ARMFLAK;…
00000050 7D 0D 0A 0D 0A 5B 4D 45 4E 55 45 4E 54 52 59 32 }…[MENUENTRY2
00000060 5D 0D 0A 09 7B 0D 0A 09 55 4E 49 54 4D 45 4E 55 ]…{…UNITMENU
00000070 3D 41 52 4D 41 43 56 3B 0D 0A 09 4D 45 4E 55 3D =ARMACV;…MENU=
00000080 33 3B 0D 0A 09 42 55 54 54 4F 4E 3D 33 3B 0D 0A 3;…BUTTON=3;…
00000090 09 55 4E 49 54 4E 41 4D 45 3D 41 52 4D 46 4C 41 .UNITNAME=ARMFLA
000000A0 4B 3B 0D 0A 09 7D 0D 0A 0D 0A 5B 4D 45 4E 55 45 K;…}…[MENUE
000000B0 4E 54 52 59 33 5D 0D 0A 09 7B 0D 0A 09 55 4E 49 NTRY3]…{…UNI
000000C0 54 4D 45 4E 55 3D 41 52 4D 41 43 41 3B 0D 0A 09 TMENU=ARMACA;…
000000D0 4D 45 4E 55 3D 33 3B 0D 0A 09 42 55 54 54 4F 4E MENU=3;…BUTTON
000000E0 3D 33 3B 0D 0A 09 55 4E 49 54 4E 41 4D 45 3D 41 =3;…UNITNAME=A
000000F0 52 4D 46 4C 41 4B 3B 0D 0A 09 7D 0D 0A 0D 0A 0D RMFLAK;…}…
00000100 0A .

Yay! Clear decoded text. Write this chunk out, and go get the next one.
When there are no more chunks, close the file, and go process the next one.

To recompress, do something like the following:

WHILE look ahead buffer is not empty
find a match in the window to previously output data
IF match length > minimum match length
output reference pair
move the window match length to the right
ELSE
output window first data item
move the window one to the right
ENDIF
END

If CompMethod is 2, use ZLib compression to decompress the block. You can get the zlib
source code from the zlib home page at http://www.cdrom.com/pub/infozip/zlib/

From here, you’ve got enough data to proceed on your own. Good luck!

Like I said, if you have any questions, let me know.

Joe D
[email protected]

这个太棒了，考虑翻译。

hanbyte · 2010 年4 月 1 日 04:15

支持翻译~ >_<

真人游戏|足球篮球|时时◇彩| 六合投◇注| 网络赚钱去SO娱◇乐◇城:顶级信用◇提现百分百即时到账SO.CC