Montag, 21. Juli 2008

Language.C: Analysing Definitions

Finally I have a (more or less) working implementation for analysing declarations and definitions in Language.C.
From a practical point of view this means one can now easily perform file-scope analysis, e.g. to analyze header files.
As the API needs some more revisions, and there are a few bugs to hunt left, I'll rather present an working example now, and defer explanations to the next post.
Consider the following C file

typedef struct t { char x; short y; } __attribute__((packed)) T;
union u { T x; T* y; };
struct s {
struct k { short b1 : 8, b2: 9, b3: 8, b4 : 7;} x;
union u a,*b;
};

We'd like to know which composite types (struct or union) are defined in this file, and how many bytes each of them occupies on our system.
Instead of guessing we'll ask gcc, in the same way as autoconf does.

> module Main where
> import System.Environment ; import System.IO
> import Control.Arrow ; import Control.Monad
> import Data.Map (Map) ; import qualified Data.Map as Map
> import Data.Maybe ; import Data.Function (fix)
> import Data.Generics

The public API of Language.C consists of a just a few modules, and Language.C.hs itself exports everything needed for most tasks

> import Language.C -- Language.C.{Syntax,Pretty,Parser,InputStream}
> import Language.C.Analysis -- the new analysis modules
> import Language.C.System.GCC -- preprocessor used
> import Language.C.Analysis.Export -- [alpha, hence an extra import]

The overall strategy will be the following: We'll parse and analyze the file, get the definitions of structs and unions we're interested in, replace typedefs and enums in the type definitions, and then generate code for gcc.

> main :: IO ()
> main = do
> let usage = error "Example Usage: ./ComputeSize -I/usr/include my_file.c"
> (opts,c_file) <- liftM (init &&& last) getArgs
>
> let compiler = newGCC "gcc"
> ast <- parseFile compiler Nothing opts c_file >>= checkResult "[parsing]"
>
> (globals,warnings) <- (runTrav_ >>> checkResult "[analysis]") $ analyseAST ast
> mapM (hPutStrLn stderr . show) warnings
>
> print $ pretty (generateSizeTests c_file globals)
> where
> checkResult :: (Show a) => String -> (Either a b) -> IO b
> checkResult label = either (error . (label++) . show) return

To generate the C code, we get the composite type definitions we're interested in and those we need to compute the size. Then we define the types and finally emit the main function.

> generateSizeTests :: FilePath -> GlobalDecls -> CTranslUnit
> generateSizeTests c_file globals =
> translUnit $
> map defineComp referenced_comps ++
> [ genSizeTest (Map.elems comps_of_interest) ]
> where
> comps = Map.mapMaybe fromComp (gTags globals)
> comps_of_interest = compsOfInterest c_file comps
> referenced_comps = computeRefClosure comps comps_of_interest
> fromComp (CompTag struct_union) = Just struct_union
> fromComp (EnumTag _) = Nothing
> translUnit = flip CTranslUnit intn
> intn = mkUndefNodeInfo
We're only interested in those definitions in the given file which have a name
> compsOfInterest :: FilePath -> Map SUERef CompType -> Map SUERef CompType
> compsOfInterest c_file = Map.filterWithKey isNamed . Map.filter isInCFile
> where
> isInCFile = ((==c_file) . fileOfNode)
> isNamed sue_ref _ = case sue_ref of AnonymousType _ -> False; _ -> True
computeRefClosure is a little more complicated, so we'll continue with defining the composite types. As an enumeration always has int type, we simply replace its type with TyInt; furthermore we dereference all typedefs, using Data.Generics (everywhere).

> defineComp :: CompType -> CExtDecl
> defineComp ty = CDeclExt (CDecl (map CTypeSpec (exportCompType $ derefTypeDefs ty)) [] intn)
> where
> derefTypeDefs ty = everywhere (mkT derefTypeDef `extT` replaceEnum) ty
> derefTypeDef (TypeDefType (TypeDefRef _ (Just ty) _)) = ty
> derefTypeDef ty = ty
> replaceEnum _ = TyIntegral TyInt
> replaceEnum dty = dty
Generating the main file without hacking would currently be to much work, as there isn't much support for code generation.
Quasi-Quoting would really help here - for now we `simulate' it using pretty and parse...

> genSizeTest :: [CompType] -> CExtDecl
> genSizeTest tys = either (error.show) fromExtDecl $
> parseC (inputStreamFromString test) (Position "autogenerated" 1 1)
> where
> fromExtDecl (CTranslUnit [decl] _ ) = decl
> fromExtDecl (CTranslUnit decls _) = error $ "Expected declaration"
> test = "int main() {" ++ concatMap checkSize tys ++ "}"
> checkSize (CompType sue_ref tag _ _ _) =
> let tag_str = show tag ++ " " ++ show sue_ref in
> "printf(\""++ tag_str ++": %d\\n\",sizeof(" ++ tag_str ++ ")); ";
Running this program on the above input results in the following C file

struct k {
short b1 : 8; short b2 : 9; short b3 : 8; short b4 : 7;
};
struct __attribute__((packed)) $1749 {
char x; short y;
};
union u {
__attribute__((packed)) struct $1749 x;
__attribute__((packed)) struct $1749 * y;
};
struct s {
struct k x; union u a; union u * b;
};
int main()
{
printf("struct k: %d\n", sizeof(struct k));
printf("struct s: %d\n", sizeof(struct s));
printf("struct t: %d\n", sizeof(struct t));
printf("union u: %d\n", sizeof(union u));
}
Now we can use gcc to compute the size of the structs:

/ComputeSize compute_size.c | gcc -x c -o compute_size - && ./compute_size;
struct k: 6
struct s: 16
struct t: 3
union u: 4

The missing computation of directly nested structs and unions concludes:

> computeRefClosure :: Map SUERef CompType -> Map SUERef CompType -> [CompType]
> computeRefClosure all_comps initial_comps =
> fixCont addReferenced ([], Map.elems initial_comps, (Map.empty,Map.empty))
> where
> fixCont f = fix $ \close args ->
> let args'@(result',todo',_) = f args
> in (if null todo' then reverse result' else close args')
> addReferenced (result,[],ms) = (result,[],ms)
> addReferenced (result,(t:ts),(visit,enter))
> | Map.member (sueRef t) enter = (result,ts,(visit,enter))
> | Map.member (sueRef t) visit = (t:result,ts,(visit,Map.insert (sueRef t) t enter))
> | otherwise = let refd = referenced t
> in (result, refd++(t:ts), (Map.insert (sueRef t) t visit,enter))
> referenced (CompType _ _ members _ _) = mapMaybe getRefdComp members
> getRefdComp memberDecl = fromDirectType (declType memberDecl) >>= fromCompTy
> fromCompTy (TyComp (CompTypeDecl ref _ _))
> | (Just r) <- Map.lookup ref all_comps = Just r
> | otherwise = error $ "Internal Error: Could not find definition for "++show ref
> fromCompTy _ = Nothing
> fromDirectType (DirectType tyname _ _) = Just tyname
> fromDirectType (TypeDefType (TypeDefRef _ ref _)) = (fromDirectType.fromJust) ref
> fromDirectType _ = Nothing
I hope that this example gives an approximate feeling what the library can achieve right now.

Keine Kommentare: