Learning a big new codebase
-
dandy72 wrote:
..while keeping in mind that the actual product probably deviates substantially from the original documentation.
Hence the "If you are lucky to work at a company that has decent documentation practices."
It was broke, so I fixed it.
I've been at this for 40 years and have yet to find a company that had more than completely minimal documentation at a level that could help a developer. It has always been a learn-as-you-go process. Most developers do NOT document their work.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
As always, working backwards is a good approach. You can find previously FIXED Items, and review the posted code changes that fixed the item. I would recommend being able to build/test the previous version, and verify the bug. Apply the fix. Verify the bug is gone. If you get decent at that. Then get realistic. It takes approximately 5,000hrs to master a new skill. Assuming you have mastered programming in general, lets assume a large code base will take you about 1,000hrs for a solid basic understanding. (Half a work year). Yeah, it is easy to jump in and hack away. But actually mastering a code base. This gets to the REASON others suggest you support a code base that you already use, like, and would like to extend. BTW, as you setup your environment to test/validate prior updates. Considering reviewing and enhancing the documentation that helps others get to where you got to.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
-
I've been at this for 40 years and have yet to find a company that had more than completely minimal documentation at a level that could help a developer. It has always been a learn-as-you-go process. Most developers do NOT document their work.
In the medical device industry if we do not have documentation, you will not be able to sell your device. It is a requirement and for good reason. Would you want to be on the operating table being monitored by devices with software of unknown provenance? "Most developers do NOT document their work." and we wonder why the quality of the software out there sucks. That's called winging it and in my opinion it is unprofessional and if a developer is unable or unwilling to maintain at least some level of documentation I would not be inclined to hire them or to keep them in my employ.
It was broke, so I fixed it.
-
S Houghtelin wrote:
read the project documents
...while keeping in mind that the actual product probably deviates substantially from the original documentation.
The sad part in all the comments on this topic is that not one suggested writing some documentation for the project. Documentation is always someone else's responsibility. Two years ago I was handed 100KLOC of undocumented but production critical cowboy code. Programmer who wrote it was adamant that "the code is self documenting". It wasn't. It took 18 months to document it to the point where it could be maintained...barely. If you REALLY want to contribute to a project, write something other than code. "Everyone complains about the weather, but no one does anything about it."
-
If you can find an issue that does not have a UnitTest, contribute a UnitTest that reproduces the issue.
I found the doxygen tool to be very helpful. I can show some call traces and is very convenient for navigating objects.
-
It's admirable wanting to get involved and commit to an open source project, but my suggestion would be only get involved in a project if it's something you use/reference as part of some other development project you are working on, and there are improvements or fixes that would benefit your own project that you believe would also be of benefit to others.
Wastedtalent wrote:
only get involved in a project if ...
I disagree. There is great value in studying large, commercial-grade software for its own sake. Even if the OP never commits but just spends hours or days spelunking one of the codebases he mentions, he is bound to be enriched by the process. Many of the techniques of enterprise-scale coding can't be taught in books.
-
In the medical device industry if we do not have documentation, you will not be able to sell your device. It is a requirement and for good reason. Would you want to be on the operating table being monitored by devices with software of unknown provenance? "Most developers do NOT document their work." and we wonder why the quality of the software out there sucks. That's called winging it and in my opinion it is unprofessional and if a developer is unable or unwilling to maintain at least some level of documentation I would not be inclined to hire them or to keep them in my employ.
It was broke, so I fixed it.
That's certainly good to know. I wasn't talking about end-user documentation, though, I was talking about the documentation that would help a developer. I wonder if the code behind those medical devices is documented any better than what I've seen in a dozen or so other industries?
-
Wastedtalent wrote:
only get involved in a project if ...
I disagree. There is great value in studying large, commercial-grade software for its own sake. Even if the OP never commits but just spends hours or days spelunking one of the codebases he mentions, he is bound to be enriched by the process. Many of the techniques of enterprise-scale coding can't be taught in books.
I agree, I also think studying and getting involved in are two very different things, and the OP was talking about making commits.
-
That's certainly good to know. I wasn't talking about end-user documentation, though, I was talking about the documentation that would help a developer. I wonder if the code behind those medical devices is documented any better than what I've seen in a dozen or so other industries?
We have to comply with GMP, UL, ISO, FDA, CE and EU standards among others. We are required to have our documentation internally and externally reviewed and accepted by the regulatory bodies. Every aspect of the product needs to go through risk and hazard analysis and QA tested using the very documents the software developer wrote. If the software and document do not match, it needs to be corrected and retested. This doesn't mean bugs can't get through, but certainly the obvious glaring stuff rarely does. This is why it takes forever and a massive amount of $₤€ to get a new product out. End user documentation is also very regulated, but thankfully I don't have to deal with that aspect.
It was broke, so I fixed it.
-
We have to comply with GMP, UL, ISO, FDA, CE and EU standards among others. We are required to have our documentation internally and externally reviewed and accepted by the regulatory bodies. Every aspect of the product needs to go through risk and hazard analysis and QA tested using the very documents the software developer wrote. If the software and document do not match, it needs to be corrected and retested. This doesn't mean bugs can't get through, but certainly the obvious glaring stuff rarely does. This is why it takes forever and a massive amount of $₤€ to get a new product out. End user documentation is also very regulated, but thankfully I don't have to deal with that aspect.
It was broke, so I fixed it.
Good to hear, and also makes a lot of sense. Most of the organizations I worked with were not producing software that could wind up being "life critical" like that. Glad to hear that someone does it.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
In GREP we trust. Use GREP, find in files, or find usages to see all the references to a particular class and it's public methods. Start the new dev with a small task - a bug fix or minor enhancement. Then have the new dev document every class and method that contributes in some way to the scenario. Have the new dev document every other class and method that depends on the code that is changed, all the way up to the UI or interface. Also, I absolutely concur with the idea of debugging and examining the stack. If unit tests and integration tests are available then run these in the debugger. If a developer went to the trouble to write unit tests, then it must be important.
-
I agree, I also think studying and getting involved in are two very different things, and the OP was talking about making commits.
I also suggest that getting involved in that depth with a project that you use regularly will increase your interest and commitment to the process. Doing work on a project that you have little interest in, just for the sake of it, will soon feel like a thankless task. I've only ever contributed to projects that I have a direct interest in, because the first step in contributing to any product is using it.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
Could I also echo the responses from some of the other repliers that that is a chronic shortage of documenters for most open source products? They are like rocking horse poop. If you want to hone your coding skills and can find a project that deeply interests you, then hack away. If you are just looking to contribute to a project that matters to you, then documenters are always welcomed with open arms.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
I think your first impulse (find and solve an issue) was the right one. At least you have a "goal" in mind; the rest ("reading code") gets old pretty fast. Ultimately, you will find out your value is in seeing the big picture quickly, and prioritizing what needs to be done. A lot of code never gets executed or deals with fringe cases; better to focus on the stuff that actually gets run; i.e. the "buggy" parts.
-
The sad part in all the comments on this topic is that not one suggested writing some documentation for the project. Documentation is always someone else's responsibility. Two years ago I was handed 100KLOC of undocumented but production critical cowboy code. Programmer who wrote it was adamant that "the code is self documenting". It wasn't. It took 18 months to document it to the point where it could be maintained...barely. If you REALLY want to contribute to a project, write something other than code. "Everyone complains about the weather, but no one does anything about it."
Documentation can be self documenting as well. The amount of times I despair when I see a summary of a method which basically repeats the method name. Code should be simple and self explanatory as to the implementation. If it isn't then it probably needs to be refactored. A method can explain its function in its name, no need to repeat it (as an example I saw the documentation to an attribute "rtpHeaderExpected" as "expects an rtp header"). Documentation is useful when it explains the why of code, not the what (which is what the code should explain). So yes to documentation, but only when its useful !
-
In the medical device industry if we do not have documentation, you will not be able to sell your device. It is a requirement and for good reason. Would you want to be on the operating table being monitored by devices with software of unknown provenance? "Most developers do NOT document their work." and we wonder why the quality of the software out there sucks. That's called winging it and in my opinion it is unprofessional and if a developer is unable or unwilling to maintain at least some level of documentation I would not be inclined to hire them or to keep them in my employ.
It was broke, so I fixed it.
S Houghtelin wrote:
In the medical device industry if we do not have documentation, you will not be able to sell your device
S Houghtelin wrote:
"Most developers do NOT document their work." and we wonder why the quality of the software out there sucks
The only way comparing the software industry with the medical device industry could be fair is if software was priced to match said medical devices. Don't blame developers for not documenting their code. That decision is not made by them.
-
S Houghtelin wrote:
In the medical device industry if we do not have documentation, you will not be able to sell your device
S Houghtelin wrote:
"Most developers do NOT document their work." and we wonder why the quality of the software out there sucks
The only way comparing the software industry with the medical device industry could be fair is if software was priced to match said medical devices. Don't blame developers for not documenting their code. That decision is not made by them.
Sadly this is very true. When there is a clock ticking down the profit margins, documentation is usually the first casualty.
We're philosophical about power outages here. A.C. come, A.C. go.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
Since you're a novice first focus on what you are comfy with. Pick that layer. Pick up a tool like Ndepend or Nitriq and see how the layers interact. Then and only then start playing on the keyboard.
-
Do you have any recommended strategies for a junior developer when attempting to learn a large new codebase? One of my goals is to make some commits on something like ASP.NET MVC (.NET Core now), Entity Framework, Node.js, or some other major project on GitHub. Not surprisingly however, when I open the project file for these, it can be tough trying to figure out where to even start. Of course I can view the issues and try my hand at solving one, but I found that even that often requires a general idea of the project's moving parts. Do you have any suggestions or resources on breaking down a big project like this to bite-sized chunks that can be learned over time in hopes of a serious contribution? One strategy I've tried is looking at the classes that I am familiar with from using the software and also looking at the unit tests to get an idea of whats happening. Thanks.
Here are a few tips. If nothing else fix layout issues - indentation, spacing etc - and add (sensible) comments where it makes sense. The act of tidying up layout and having to think about what a small section of code is doing, in its own right, will help build your understanding of the bigger picture. From there is probably won't take long for you to start spotting refactoring opportunities. If you do decide to make changes, start with the small trivial things since these will often be overlooked or tolerated for the sake of the big things. Build up a testing regime for your changes BEFORE you make the changes. Don't focus on code structure or control too much. That will become obvious. The key to any code base is how it organizes its data and moves it around. This you can analyse and diagram. Remember, fundamentally all any software is really about is moving data from A to B. With this in mind, pay special attention to the interfaces between modules, components and systems. This is likely where the most problems are. Especially when either side of the interface has been independently developed. Also look for places where data is transformed from one form to another e.g. conversions, lookups. Another source for faults. If you are able to run the software, another way of gaining understanding is to include detailed logging/tracing of the software's operation as it is running. In this context, look to trace the initial state of variables and when variables change, the function calls (including explicit variable values) and function returns and error events in the code. Those three categories of logging should be enough for you to hone in on most problems with the code when it is running. This is ofc verbose and has performance implications so make sure it can be turned off or removed from the release product entirely. Finally, study design patterns and identify where they have been used in the code. Either intentionally or unwittingly. You may be lucky and the patterns may be explicitly named e.g. WidgetFactory or WangleAdaptor. Design patterns are not the be all and end all, but they are a useful shorthand for common development problems.