Creating graph of file usage in build tree

Adam Dare

Hello there, I'm starting up a little personal project that I'm not too sure what the best way to tackle is. I'm hoping someone could give me a few pointers and/or some ideas on questions I should answer before I dig in. The problem I'm trying to solve to determine file and project dependancies in a big sprawling codebase with tens of thousands of files in it. This codebase has evolved over time and can be quite unwieldy. There are many sub projects inside the build tree and lots of interdependancies between these projects. I want to be able to map these depenencies out. I found a library that will allow me to moniotr file activity, for example file opens and creates. So my thought is if I run a full clean build and monitor and record all of the file activity of a build I can generate a dependency graph of the entire project. This will also allow me to be able to determine what I should build and in which order when I want to build a tiny piece of the tree. I'm sure there will be other interesting things I can do with this information. I'd also like to try and visualize the entire project and maybe create a file change heat map from it. My quandry is how best to record the file activity so that I can build a depenecy tree. Ideally I'd like to be able to do this in a multi-threaded way, as our build system can utilize multi-proc machines and have multiple files building at once, and be able to tell which files need to be built before others and which files are grouped in a project and so on. My current proposed approach is to record all file activity to a logfile and the post process it to generate the depenecy graphs. I'm still a little hazy about all the data that I need to record. I'm currently thinking I'll figure that out as I go when I find I'm missing some important information. Any pointers or thoughts would be greatly appriciated. :-D Thanxx, Adam

ky_rerun

wouldn't static code analyzation be a better choice by simply parsing all of the includes assuming its a c or c++ program. or another option to have you project just output the result of the preprocessor in visual studio it will tell you which files it included.

a programmer traped in a thugs body

Adam Dare

Hmm, I'm not sure how hard it would be to create something that would properly process it statically. There is some thinging here that needs to be done around tracking geneology, but once I figure that out I would thing this shouldn't be too hard, but then again I always get blindsided by some of the small details I didn't see when I was thinking about a problem at a higher level. :-) Although I don't think all of the files that need to be processed are C/C++ files, there are resource files as other types of files. And these projects aren't built in VS, they're built in our own build system base off of nmake. Adam

ky_rerun

Since you are using nmake I'm going to assume you are using gcc while this won't give your resource files it will let you see your source dependencies. How are you using the resource files are they linked into the some type of executable shouldn't you be able to see what you linking in in your make file. this is from the gcc online manual -

M
Instead of outputting the result of preprocessing, output a rule suitable for make describing the dependencies of the main source file. The preprocessor outputs one make rule containing the object file name for that source file, a colon, and the names of all the included files, including those coming from -include or -imacros command line options.

Unless specified explicitly (with -MT or -MQ), the object file name consists of the basename of the source file with any suffix replaced with object file suffix. If there are many included files then the rule is split into several lines using \\-newline. The rule has no commands.

This option does not suppress the preprocessor's debug output, such as -dM. To avoid mixing such debug output with the dependency rules you should explicitly specify the dependency output file with -MF, or use an environment variable like DEPENDENCIES\_OUTPUT (see Environment Variables). Debug output will still be sent to the regular output stream as normal.

Passing -M to the driver implies -E, and suppresses warnings with an implicit -w.

a programmer traped in a thugs body

Adam Dare

Actually we build using the VC compiler, just not inside VS or a VS project. That system doesn't scale well for our needs. While this is an interesting approach I'm still favoring the monitoring of the file activity external to the compiler for serveral reasons: 1. We have C/C++ and C# projects in our tree, and possibly other languages as well that I haven't had to deal with yet. 2. We deal with a number of different building tools that may not have output options like the more mature C/C++ compilers do. 3. Since I have a number of differnt build tools I need to track dealing with each one separately and maintaining the output processing code for handling each tools output sounds like a big task and fragile Taking a more build tool agnostic approach to gathering this information I feel will make the tool more reliable and take less maintenance work. I won't be trying to track down all of the differnt build tools we use and the necessary command line options to gather the information. And then have to figure out how to process each tools output to gather the information I need. Adam