Rules for the source code repository.
Note about different source code repositories
There is a source code repository at GitHub where among others the C++ program sources are stored. All the branches can be found there.
The content of the source code repository
What does belong in the source code repository
This is about what should an what should not be stored under version control in the source code repository. Obviously the source code repository should contain source code, which is defined as follows in the GNU General Public License:
"The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable."
What does not belong in the source code repository
Anything that can be easily generated from the source code should never be stored in the source code repository.
Composed music is edited with programs like NoteEdit. Therefore the files used by those programs (for example in the (text-based) ''not'' format) must be present in the source code repository. The program using the music will probably not use the source code directly. It needs raw sound data, which is generated from the source code and usually stored in the ''wav'' format. To save disk space, the program can often take compressed raw sound data, which is usually stored in the ''ogg'' format. Generating the ''wav'' or ''ogg'' from the ''not'' is not considered easy. It requires special software in the form of programs and soundfonts. Many developers do not want to maintain those softwares on their system, especially if they are not going to work with the music. Therefore it is allowed to store the generated raw sound data in the source code repository.
Recorded sound is raw sound data (captured with a microphone). This format is not very suitable for making modifications. Nevertheless it is often the preferred format for making modifications, since there is no other format. Therefore the raw sound data is source code. An exception would be if it was easier to record the sound effect anew than to edit the raw sound data. In that case the raw sound data is not the source code. Instead, a description of how the sound is recorded is the source code. Suppose one has a sound effect where someone pours water from a bucket into a trough. The recording was disturbed by someone whisteling in the background. One wants to modify the sound effect to get rid of the background whisteling. If it is considered easier to redo the recording while making sure that nobody is whisteling in the background than to edit away the wisteling in the raw sound data, the description of how the recording is done is the source code, not the raw sound data. On the other hand, if the sound effect is the recorded oestrus sound of an extinct animal, it is not possible to redo the recording. Then the raw sound data is the source code. The raw sound data should always be accompanied with relevant metadata, such as how, when, and where it was recorded.
Modelled images are similar to composed music. They are created in special programs like Blender or Gimp. The files used by those programs (for example in the ''blend'' or ''xcf'' format) must be present in the source code repository. The program using the image will probably not use the source code directly. It needs raw image data, which is generated from the source code and usually stored in the (compressed) ''png'' format. Generating the ''png'' from the ''blend'' (by a process called rendering) is not considered easy. It requires special softwares. Many developers do not want to maintain those softwares on their system, especially if they are not going to work with the images. The rendering is also extremely calculation intensive. Therefore it is allowed to store raw image data in the source code repository.
Photographs are similar to recorded sound. They are raw light data (captured with optical sensors). Se above under Recorded sound. When a digital camera is used, it is the raw image that is the source code, not the jpeg image that the camera has generated.
These are images where the artist has set the color of each pixel directly. Modification is done in the same way. Therefore the raw image data is the source code.
Message catalogs are collections of message strings from a program module or data file. Templates (''pot'') for the message files are created by tools like xgettext by searching through the program module or data file for marked strings. The templates are never modified directly (only generated). Therefore they are not source code. The generation is done with simple tools that all developers should have installed, and is not comutationally expensive. Therefore the files should not be under version control. The templates are then used to create new message catalogs for languages (''po''). The actual translation work is done by editing the ''po'' files with tools like KBabel, so the ''po'' files are source code. Programs do not use the ''po'' files directly. They use ''mo'' or ''gmo'' files, which are generated from ''po'' files. This generation is done with simple tools that all developers should have installed, and is not computationally expensive. Therefore the ''mo''/''gmo'' files should not be under version control.
Program design files
Program design files can be for example UML diagrams, which are created with tools like Dia or Umbrello, or user interface definition files, which are created with tools like Qt Designer. They are usually stored in ''xml'' format and are sourcecode. Other code is generated from those files (for example with tools like Dia2Code or Umbrello). Those generated files are not source code initially, but if they are edited further, they become source code. Note that such editing will break the connection to the design file, thereby creating a fork. So the modifications should be made to the design files whenever possible.
Documentation is usually written in some higher level language such as ''docbook'', which is the source format. From those files, lower level formats such as ''ps'', ''pdf'' and ''html'' can be generated. Those generated files should not be under version control.
A file can contain a filesystem, which can be extracted or mounted. For example if a file contains a ''tar'' or ''zip'' filesystem, it can be extracted with the programs tar and unzip respectively. If a file contains an ''ext2'' or ''iso9660'' filesystem, it can be mounted with mount. Files containing filesystems must not be under version control, because the version control system does not understand the structure. The files in the embedded filesystem must be in a subdirectory instead. The embedded filesystem can easily be recreated from this subdirectory.
In general files should not be compressed when under version control. A small change to the uncompressed content (for example a single byte) can cause a much larger change to the compressed file, which burdens the version control system. Textual files must not be stored in compressed form, because that has the huge disadvantage that commit mails will not containt the difference, which makes it much harder to review the work of other developers. An exception to the rule is when the compression is not easy. It may for example be done with a program that uses brute force to make expensive calculation, like Pngcrush. Then it is allowed to store the content in compressed form.
These are rules for how changes should be made to the files in the source code repository. It is not easy to follow them and mistakes will happen frequently, but people should at least be aware of how things should be done in theory.
A commit should not cause any regression. This means that everything that worked before the commit must work after the commit. So one should not commit a halfdone change that does not compile or run just to get it backed up in case ones local systems fails before one is done writing the change. One should wait until the feature is regression free. If backup is needed in the meantime, one should create a patch and upload it somewhere.
Do not misuse the version control system as a communication channel
When commits are made, an e-mail is sent to a mailing list which the other developers read. But one should not misuse this feature as a communication channel. If one is unsure about wether a fix is correct, one should not commit it with a message like "this fix might work but I doubt it, please take a look". Instead, one should write to the ordinary development list and attach the patch.