Axioms of Software Business - II
No final decisions yet, just thinking aloud
S V Ramu (2002-05-25)
Prelude
This is a sequel to the previous article (Part I, Axioms of Software Business) on similar lines. Beware, both are just collections of rambling open thoughts. Much more thinking is required, and will be done. The idea of such half-baked articles is to share the early impressions with others who are similarly inclined, and thus evolve a robust model for Software Development. And once this standardization happens, few FAQ can be included and ideally few CheckLists too, if possible.
Version Control
Sometime back we considered shunning the version numbers altogether, and keep only the date of release (it may even be predetermined). But it doesn't allow the tracks model. Maybe we can consider each new track a newly named product, like Product-1, Product-2 etc. and continue to use dates for each new update. But version numbers does give a sense of progress and a succinct way to state the degree of change (through major minor micro numbers). What about the project codes? (Like Kestral, Merlin, and Tiger etc. of Java) This again is good only as an add on to the version numbers, for glamour and community feeling, but cannot be good by itself alone.
We can imagine the software bundle to be in many tracks, though usually only one track. A track is bundle of code, which can have incremental versions of changes. For example, we could start with the main track and version that code linearly. Say the version goes typically from 0.1,..,0.5,..1.0,1.0.1,..1.1,1.1.1,.. Now suddenly we want an architectural change in version 2.0, but cannot abandon the successfully running 1.x series, what do we do? We branch of the source to 2.0, but continue to maintain the 1.x series. So now, we have to maintain two different sources, and keep it in sync. A vital bug fix in 1.x, if relevant, has to be implemented in 2.0 too. But this is unavoidable sometimes. The goal could be to maintain only one track initially, and ideally have only two before each new major numbered version (and even for some time shortly after).
Or, shall we completely any type of forking the project (unless it is a new product), and make the new architectural prototyping as a separate subproject! This sounds good, since this way we avoid the new track's version number doubts (Should it be 2.0, or 1.9 or something like 2.0pre1). Say we want to make future version of the our product (say BingoEditor) with a sleek memory model, we could branch it of as a subproject called SleekMemoryProtype 0.1 (instead of forking the whole project as BingoEditor 2.0pre1). Will this be good enough for any type of changes? Could be, but will not give the benefit of introducing the new product as a new version and thus encouraging people to test it by using. A serious lacuna.
What should be the initial version? Symbolically we could start with 0.1, and latter jump version if we are very ahead. This also allows a wallowing time, before the first major release. How should we increment the version numbers? First of all we should accept the version parts are independent numbers and not just decimal. Thus 1.12 could be a valid version number. But 1.1 and 1.12 could not be considered successive versions (unless jumped). This way, we can keep our unit of change constant, instead of squeezing major future changes in smaller version numbers just to get to arrive at the whole number. For example, instead of saying the future changes as 1.9, 1.99 and 1.999 (though every change is equally sized), we could as well say 1.9, 1.10, and 1.11 etc.
So now, after this round trip analysis, it does seem that tracks are important. But remember that 2.0 will not wait for 1.x to merge with it after 1.9 say. Imagine that 2.0 has a life of its own, and at some point of time all the relevant bug fixes of 1.x will be incorporated in 2.0 and 1.x will be abandoned ie. unmentioned. It is like branching off a new product which is initially shaky but superior in design, and the old one is strong for now but with shaky future. A good transition model. So, we do need tracks and resources to maintain it, depending on the customer's interest.
In this light, to allow 2.0 as a legitimate starting point, we must allow 0.0 too as the starting version (and not the 0.1). Of course 0.0 could be the plain initial could coming into the project's fold (it could already be functioning), and 0.1 could be the initial code release. Is anybody using version 0.0 idea? Any pitfalls here?
How to use 'alpha' and 'beta'? These are useful in giving an idea about the product's stage. Also these are countdown model numbering. For example, 1.9 version could be called as 2.0 beta, and say 1.5 could be called as 2.0alpha1 and 1.6 as 2.0alpha2 so on. This gives a sense of direction to the project. Maybe we can include 'pre/test/trial' into this soup and imagine that any release (1.0 or 1.1) consists of three stages namely 'pre' (starting), 'alpha' (growing), and 'beta' (stabilizing). We can always have numbers with these, like pre1, pre2,... alpha1, alpha2, ... and beta1, beta2,... and finally with 1.0 or 1.1, or even 1.1.1.
If 2.0 is considered an experimental branch's starting point (cannot be a summit, as we cannot have 1.8, 1.9 etc, nor the pre1 etc.), then we cannot rely that a major release is always a stable release. Maybe we can use the Linux kernel model, where the odd numbered release is considered developmental and even numbers as stable. This way 0.0 will not be there (nor can it be the initial stable release?). This idea is interesting, as the product goes through a cycle of stability and experiment. Wow! Also there is only one track usually this way. But even here, a major new experimental version has to be a separate track.
Delegation and Records
We need not use Version control managers for parallel development, because anyway we need central scheme for when to check in, and who to approve the changes into the project. Thus we can avoid the costly options like SourceSafe and ClearCase or the free centralized model like CVS or sourcefourge.org. We can as well conduct our whole development with emails alone! Provide we spilt the projects into a tree of packages and these into classes, which can be worked upon by a single developer. When these individual class owners complete, the package manager integrate them into the package and alert the his parent package.
This model allows highly decentralized form of development, yet with a centralized control, and with a very low overhead (just the mails, and a central website, and maybe a group mailing system?). The main point is the delegation of work to as many people as possible, and more importantly a highly delegateble design. More spread out the tree is, or more clear the hierarchy of communication between the A parent package and the sub package, and this more the number of developers in the project, better is the design to that extant.
Initially the project leader designs a robust top level packages and their interaction between each other. He then allocates these packages to package leaders (one person can lead many package, but only one leader per package). These package leaders themselves can delegate further down to sub package leaders. Of course a robust full tree of packages should be designed up front, but with allowance for minor internal packages for coding convenience. We have to workout the between package interactions too: like, no parent package can use its child package; The child can use its parent freely. But if we say that any sibling can be called by another sibling package, then a circular reference is possible. Is there any reasonable and simple rule? that will eliminate this circularity but yet allows powerful imports? Maybe we can and should draw a tree between the importable packages, before hand. This means that we cannot just be dependent on the physical directory structure but also our own conceptual dependency tree. Can we do something to use just the physical tree as cue for the whole package dependency model?
Another issue to be addressed early on is, the job assignment and status reporting. This includes spec document structure too. As we discussed in the part I of this article, we can use the MoU model. Where, a task has to be explained in very minimal terms (we need some metrics for what is minimal) by the package manager to the sub-package manager, and expect that they come up with a complete TODO report with deadlines and metrics. After this the package can approve or alter this report. This way the load is less on the parent package managers (PM), and they also get to know what is understood by their wards. The SPM too gets the clear cut assurance that their PM have approved and cannot go back on their words. What should be the minimal (but robust) checklist for the initial 'hand-waving' report to the SPM, and what should be their 'complete' report? This needs some thinking.
Design and Performance Metrics
What are the minimal design and performance metrics that we need? Maybe we an say that a package's classes should be homogenous, and any specific details should be hidden away in sub packages (Lo! we flout the rule that the parent is independent of its children. But isn't this logical? Then where we tweak to make it consistent?). We need to keep an upper limit on the Classes-per-Package, even Packages-per-Package, Methods-per-Class, Statements/Loops/Branches/Calls-per-Method and maybe variables per class/method. One thing that we can safely not-specify would be the code layout conventions like, using spaces/tabs for various structures, as this can be easily performed by many tools. Of course naming conventions have to be enforced, but simple Java conventions should amply do. Maybe we can include a list of nouns, verbs, prefixes, and suffixes (also some semantics), to regularize the name usage. One thing seems acceptable in java world is that there will be no underscores as far as possible (AFAP), and no abbreviated prefix or suffix AFAP. This makes the code that much more self documenting.
Epilogue
I'm fully aware that this article is only a brain dump of my FAQ. I fully believe that this mode of using the portal for first impression discussion, or thinking aloud, is useful in lonely future of computing. This also could make us much more disciplined in our thinking, as we are sharing much more intimate thought process with others. Any comments will be most welcomed. Soon we will make the collective responses into FAQ and CheckLists.