dcc Distribution
The code provided in this distribution is © Cristina
Cifuentes and the Queensland University of Technology (QUT). Mike Van
Emmerik and Jeff Ledermann worked for QUT while working on this project.
The contributions of the authors were as follows:
- Cristina Cifuentes (most of dcc code)
- Mike van Emmerik (signatures and prototype code, etc)
- Jeff Ledermann (some disassembly code)
dcc is provided "as
is" under the GPL license.
Contents of the distribution
The following files are included in the
dcc.tar.gz distribution:
- dcc.zip (dcc.exe DOS program, 1995)
- dccsrc.zip (source code *.c, *.h for dcc,
1993-1994)
- dcc32.zip (dcc_oo.exe 32 bit console (Win95/Win-NT)
program, 1997)
- dccsrcoo.zip (source code *.cpp, *.h for "oo" dcc,
1993-1997)
- dccbsig.zip (library signatures for Borland C compilers, 1994)
- dccmsig.zip (library signatures for Microsoft C compilers, 1994)
- dcctpsig.zip (library signatures for Turbo Pascal compilers,
1994)
- dcclibs.dat (prototype file for C headers, 1994)
- test.zip (sample test files: *.c *.exe *.b, 1993-1996)
- makedsig.zip (creates a .sig file from a .lib C
file, 1994)
- makedstp.zip (creates a .sig file from a Pascal library
file, 1994)
- readsig.zip (reads signatures in a .sig file, 1994)
- dispsrch.zip (displays the name of a function given a signature,
1994)
- parsehdr.zip (generates a prototype file (dcclibs.dat)
from C *.h files, 1994)
Note that the dcc_oo.exe program (in
dcc32.zip) is a 32 bit program, so it won't work under Windows 3.1. Also, it is
a console mode program, meaning that it has to be run in the "Command Prompt"
window (sometimes known as the "Dos Box"). It is not a GUI program.
The following files are included in the test.zip file: fibo,
benchsho, benchlng, benchfn, benchmul, byteops, intops, longops, max, testlong,
matrixmu, strlen, dhamp. The version of dcc included in this distribution
(dccsrcoo.zip and dcc32.exe) is a bit better than the first release, but it is
still broken in^M some cases, and we do not have the time to work in this
project at^M present so we cannot provide any changes. Comments on individual
files:
- fibo (fibonacci): the small model (fibos.exe) decompiles
correctly, the large model (fibol.exe) expects an extra argument for
scanf(). This argument is the segment and is not displayed.
- benchsho: the first scanf() takes loc0 as an argument.
This is part of a long variable, but dcc does not have any clue at that stage
that the stack offset pushed on the stack is to be used as a long variable
rather than an integer variable.
- benchlng: as part of the main() code, LO(loc1) |
HI(loc1) should be displayed instead of loc3 | loc9. These two
integer variables are equivalent to the one long loc1 variable.
- benchfn: see benchsho.
- benchmul: see benchsho.
- byteops: decompiles correctly.
- intops: the du analysis for DIV and MOD is broken. dcc currently generates
code for a long and an integer temporary register that were used as part of
the analysis.
- longops: decompiles correctly.
- max: decompiles correctly.
- testlong: this example decompiles correctly given the algorithms
implemented in dcc. However, it shows that when long variables are defined and
used as integers (or long) without giving dcc any hint that this is happening,
the variable will be treated as two integer variables. This is due to the fact
that the assembly code is in terms of integer registers, and long registers
are not available in 80286, so a long variable is equivalent to two integer
registers. dcc only knows of this through idioms such as add two long
variables.
- matrixmu: decompiles correctly. Shows that arrays are not supported in
dcc.
- strlen: decompiles correctly. Shows that pointers are partially supported
by dcc.
- dhamp: this program has far more data types than what dcc recognizes at
present.
Our thanks to Gary Shaffstall for some debugging work.
Current bugs are:
- if the code generated in the one line is too long, the (static) buffer
used for that line is clobbered. Solution: make the buffer larger (currently
200 chars).
- the large memory model problem & scanf()
- dcc's error message shows a p option available which doesn't exist, and
doesn't show an i option which exists.
- there is a nasty problem whereby some arrays can get reallocated to a new
address, and some pointers can become invalid. This mainly tends to happen to
larger executable files. A major rewrite will probably be required to fix
this.
For more information refer to the thesis "Reverse
Compilation Techniques" by Cristina Cifuentes, Queensland University of
Technology, 1994, and the dcc home page.
Please note that the executable version of dcc provided in this distribution
does not necessarily match the source code provided, some changes were done
without us keeping track of every change.
Using dcc
Here is a very brief summary of switches for dcc:
- a1, a2: assembler output, before and after re-ordering of input code
- c: Attempt to follow control through indirect call instructions
- i: Enter interactive disassembler
- m: Memory map
- s: Statistics summary
- v, V: verbose (and Very verbose)
- o filename: Use filename as assembler output file
For example, dcc can be invoked in the following way to produce an
assembly file (.asm) and a C-like file (.b):
dcc -a1 file.exe
If dcc encounters illegal instructions, it will attempt to enter the so
called interactive disassembler. The idea of this was to allow commands to fix
the problem so that dcc could continue, but no such changes are implemented as
yet. (Note: the Unix versions do not have the interactive disassembler). If you
get into this, you can get out of it by pressing ^X (control-X). Once dcc has
entered the interactive disassembler, however, there is little chance that it
will recover and produce useful output.
If dcc loads the signature file dccxxx.sig, this means that it has
not recognised the compiler library used. You can place the signatures in a
different direcory to where you are working if you set the DCC environment
variable to point to their path. Note that if dcc can't find its signature
files, it will be severely handicapped.
Note that dcc produces C source files; dcc does not procude C++ source. The
output is in a file with extension ".b".
Instructions to uncompress files
In a Unix box uncompress and untar the
dcc.tar.gz file: gzip -d dcc.tar.gz
tar -xvf dcc.tar
This will extract all PC zipped (.zip) files. To unzip these files use the
DOS pkunzip program: pkunzip *.zip
(you may want to check which files you need/want to look at first rather
than unzipping everything).
dcc runs on the (x86,DOS) environment.
This is a HTML-ization from the original readme file.
Last
updated: 4-May-2002