This a project that allows developers to explore and test C-APIs using a read
eval print loop, also known as a REPL.

doc/img/hello-world.gif {focus_keyword} hexagonal-sun/bic hello world

Dependencies

BIC’s run-time dependencies are as follows:

  • GNU Readline
  • GNU MP

To build BIC, you’ll need:

  • GNU Flex
  • GNU Bison
  • GNU Automake
  • GNU M4

Please ensure you have these installed before building bic. The following
command should install these on a Debian/Ubuntu system:

apt-get install build-essential libreadline-dev autoconf-archive libgmp-dev expect

Installation

You can compile and install bic with the following commands:

autoreconf -i
./configure --enable-debug
make
make install

Usage

REPL

When invoking bic with no arguments the user is presented with a REPL prompt:

BIC>

Here you can type C statements and #include various system headers to
provide access to different APIs on the system. One thing to note is that
statements can be entered directly into the REPL; there is no need to define
a function for them to be evaluated. Say we wish to execute the following C
program:

 #include stdio.h>

int main()
{
    FILE *f = fpoen("out.txt", "w");
    fputs("Hello, world!n", f);
    return 0;
}

We can do this on the REPL with BIC using the following commands:

BIC> #include 
BIC> FILE *f;
f
BIC> f = fopen("test.txt", "w");
BIC> fputs("Hello, World!n", f);
1
BIC>

This will cause bic to call out to the C-library fopen() and fputs()
functions to create a file and write the hello world string into it. If you
now exit bic, you should see a file test.txt in the current working
directory with the string Hello, Worldn contained within it.

Notice that after evaluating an expression bic will print the result of
evaluation. This can be useful for testing out simple expressions:

BIC> 2 * 8 + fileno(f);
19

Evaluating Files

If you pass bic a source file as a command line argument it will evaluate
it, by calling a main() function. For example, suppose we have the file
test.c that contains the following:

int printf(const char *s, ...);

int factorial(int n)
{
  if (!n)
  {
    return 1;
  }

  return n * factorial(n - 1);
}

int main()
{
  printf("Factorial of 4 is: %dn", factorial(4));

  return 0;
}

We can then invoke bic with test.c as a parameter to evaluate it:

doc/img/eval-file.gif {focus_keyword} hexagonal-sun/bic eval file

You can also use a special expression: ; in your source code to make
bic drop you into the repl at a particular point in the file evaluation:

doc/img/repl-interrupt.gif {focus_keyword} hexagonal-sun/bic repl interrupt

Exploring external libraries with the REPL

You can use bic to explore the APIs of other libraries other than libc. Let’s
suppose we wish to explore the Capstone library, we pass in a -l option to
make bic load that library when it starts. For example:

doc/img/capstone.gif {focus_keyword} hexagonal-sun/bic capstone

Notice that when bic prints a compound data type (a struct or a union),
it shows all member names and their corresponding values.

Implementation Overview

Tree Objects

At the heart of bic’s implementation is the tree object. These are generic
objects that can be used to represent an entire program as well as the
current evaluator state. It is implemented in tree.h and tree.c. Each
tree type is defined in c.lang. The c.lang file is a lisp-like
specification of:

  • Object name, for example T_ADD.
  • A human readable name, such as ~”Addition”~.
  • A property name prefix, such as tADD.
  • A list of properties for this type, such as ~”LHS”~ and ~”RHS”~.

The code to create an object with the above set of attributes would be:

(deftype T_ADD "Addition" "tADD"
         ("LHS" "RHS"))

Once defined, we can use this object in our C code in the following way:

tree make_increment(tree number)
{
    tree add = tree_make(T_ADD);

    tADD_LHS(add) = number;
    tADD_RHS(add) = tree_make_const_int(1);

    return add;
}

Notice that a set of accessor macros, tADD_LHS() and tADD_RHS(), have
been generated for us to access the different property slots. When
--enable-debug is set during compilation each one of these macros expands
to a check to ensure that when setting the tADD_LHS property of an object
that the object is indeed an instance of a T_ADD.

The c.lang file is read by numerous source-to-source compilers that
generate code snippets. These utilities include:

  • gentype: Generates a list of tree object types.
  • gentree: Generates a structure that contains all the property data for
    tree objects.
  • genctypes: Generates a list of C-Type tree objects – these represent the
    fundamental data types in C.
  • genaccess: Generate accessor macros for tree object properties.
  • gengc: Generate a mark function for each tree object, this allows the
    garbage collector to traverse object trees.
  • gendump: Generate code to dump out tree objects recursively.

Evaluator

The output of the lexer & parser is a tree object hierarchy which is then
passed into the evaluator (evaluator.c). The evaluator will then
recursively evaluate each tree element, updating internal evaluator state,
thereby executing a program.

Calls to functions external to the evaluator are handled in a
platform-dependent way. Currently x86_64 and aarch64 are the only supported
platforms and the code to handle this is in the x86_64 and aarch64
folders respectively. This works by taking a function call tree object
(represented by a T_FN_CALL) from the evaluator with all arguments
evaluated and marshalling them into a simple linked-list. This is then
traversed in assembly to move the value into the correct register according
to the x86_64 or aarch64 calling-conventions and then branching to the
function address.

Parser & Lexer

The parser and lexer are implemented in parser.m4 and lex.m4
respectively. After passing through M4 the output is two bison parsers and
two flex lexers.

The reason for two parsers is that the grammar for a C REPL is very
different than that of a C file. For example, we want the user to be able to
type in statements to be evaluated on the REPL without the need for wrapping
them in a function. Unfortunately writing a statement that is outside a
function body isn’t valid C. As such, we don’t want the user to be able to
write bare statements in a C file. To achieve this we have two different set
of grammar rules which produces two parsers. Most of the grammar rules do
overlap and therefore we use a single M4 file to take care of the
differences.

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here