├── NOTES ├── README.md ├── SUMMARY.md ├── Worldwide_Time_Zones.png ├── acknowledgements_and_reuse_of_materials.md ├── acs-mailman-example.png ├── activity-crs-efficiency-openstack.png ├── activity-openstack-openhub.png ├── activity-openstack-stackalytics.png ├── activity-openstack.png ├── activity-puppet-commits.png ├── activity-scm-authors-puppet.png ├── aging-openstack.png ├── bib.md ├── bmiOpenStackSoftware.jpg ├── bugzilla-lifecycle.png ├── community.md ├── companies-openstack.png ├── crs-activity-submitters-wikimedia.png ├── crs-github-example.png ├── crs-openstack-example.png ├── evaluation-models-data ├── stol.py └── stol.txt ├── evaluation-usage-netcraft.png ├── evaluation-usage-statcounter.png ├── evaluation-usage-w3counter.png ├── evaluation_dashboards.md ├── evaluation_models.md ├── factoids-openhub-git.png ├── functional-evaluation-loaoo-features-2.png ├── functional-evaluation-loaoo-features.png ├── functional-evaluation-loaoo-model.png ├── github-contributors.png ├── github-punchcard.png ├── grimoire-openstack.png ├── grimoire-tickets.png ├── grimoire.md ├── grimoire_use.md ├── grimoireng-openstack.png ├── introduction.md ├── its-activity-closers-cloudstack.png ├── its-bugzilla-example.png ├── its-bugzilla-workflow.png ├── its-github-example.png ├── kinds.md ├── location-debian-devel.png ├── mls-activity-senders-openstack.png ├── openbrr.jpg ├── openhub-liferay-commits.png ├── openhub-liferay.png ├── openstack-scm-aging-2013-07.png ├── openstack-scm-aging-2014-07.png ├── osmm.png ├── process-backlog-crs-wikimedia.png ├── processes-crs-nova-time-to-merge.png ├── processes-tickets-closed-open.png ├── processes.md ├── qsos-maturity.png ├── qsos-radar.png ├── qsos-tools.png ├── qsos.png ├── quantitative.md ├── scs-irc-example.png ├── sources_of_information.md ├── sqo-oss.png ├── stackalitics-main.png ├── stackalytics-mirantis.png ├── styles └── pdf.css ├── to_probe_further.md ├── tz-scm-authors-2013.png └── tz-scm-authors-2014.png /NOTES: -------------------------------------------------------------------------------- 1 | = Notes to be merged in the book at some time. 2 | 3 | == Scully effect 4 | 5 | A perceptible increase of women entering the fields of science, medicine, and law enforcement 6 | 7 | https://twitter.com/ginatrapani/status/690748620971380737?s=03 8 | 9 | == Zalando dashboard in GitHub 10 | 11 | http://zalando.github.io/#welcome 12 | 13 | == Data by James Falkner, Liferay, about developer motivation 14 | 15 | 16 | Blog: http://www.liferay.com/web/james.falkner/blog/-/blogs/what-do-you-think- 17 | 18 | Slideshare of complete results: http://www.slideshare.net/mobile/schtool/liferay-2012-community-survey-summary 19 | 20 | 21 | == Scancode: license scannner 22 | 23 | https://github.com/nexB/scancode-toolkit/releases 24 | 25 | 26 | == Complete table of evaluation models, with references. For now, we don't include references in the book. 27 | 28 | | Name | Year | Orig | Method | Source | 29 | | -------- |:----:|:-----:|:------:| ------:| 30 | | Capgemini Open Source Maturity Model | 2003 | I | Yes | [reference](#bib:duijnhouwer-open) | 31 | | Evaluation Framework for Open Source Software | 2004 | R | No | [reference](#bib:koponen-evaluation) | 32 | | A Model for Comparative Assessment of Open Source Products | 2004 | R | Yes | [reference](#bib:polancic-model), [reference](#bib:polancic-comparative) | 33 | | Navica Open Source Maturity Model | 2004 | I | Yes | [reference](#bib:golden-succeeding) | 34 | | Woods and Guliani's OSMM | 2005 | I | No | [reference](#bib:woods-open) | 35 | | Open Business Readiness Rating (OpenBRR) | 2005 | R/I | Yes | [reference](#bib:www.openbrr.org-business),[reference](#bib:wasserman-business) | 36 | | Atos Origin Method for Qualification and | 2006 | I | Yes | [reference](#bib:ato-method) | 37 | | Selection of Open Source Software (QSOS) | 2006 | R | No | [reference](#bib:cruz-evaluation) | 38 | | Evaluation Criteria for Free/Open Source Software Products | 2007 | R | No | [reference](#bib:sung-quality) | 39 | | A Quality Model for OSS Selection | 2007 | R | Yes | [reference](#bib:lee-study) | 40 | | Selection Process of Open Source Software | 2007 | R | Yes | [reference](#bib:cabano-context-dependent),[reference](http://www.oitos.it/opencms/opencms/oitos/valutazione_di_prodotti/modello1.2.pdf) | 41 | | Observatory for Innovation and Technological transfer on Open Source software (OITOS) | 2007 | R | No | [reference](#bib:ardagna-focse) | 42 | | Framework for OS Critical Systems Evaluation (FOCSE) | 2007 | R | No | [reference](#bib:lavazza-beyond) | 43 | | Balanced Scorecards for OSS | 2007 | R | Yes | [reference](#bib:taibi-openbqr) | 44 | | Open Business Quality Rating (OpenBQR) | 2007 | R | Yes | [reference](#bib:carbon-evaluating) | 45 | | Evaluating OSS through Prototyping | 2008 | R | No | [reference](#bib:ciolkowski-towards) | 46 | | A Comprehensive Approach for Assessing Open Source Projects | 2008 | R | Yes | [reference](#bib:samoladas-sqo-oss) | 47 | | Software Quality Observatory for Open Source Software (SQO-OSS) | 2008 | R | No | [reference](#bib:majchrowski-operational) | 48 | | An operational approach for selecting open source components in a software development project | 2008 | R | No | [reference](#bib:delbianco-quality),[reference](#bib:delbianco-observed) | 49 | | QualiPSo trustworthiness model OpenSource Maturity Model (OMM) | 2009 | R | No | [reference](#bib:petrinja-introducing) | 50 | 51 | 52 | == Simple evaluation model (OpenMRS) 53 | 54 | https://wiki.openmrs.org/plugins/servlet/mobile#content/view/93359422/93359492 55 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # evaluating-foss-projects 2 | Evaluating Free / Open Source Software Projects (Book) 3 | -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | * [Summary](README.md) 4 | * [Introduction](introduction.md) 5 | * [Before evaluating](kinds.md) 6 | * [Sources of information](sources_of_information.md) 7 | * [Evaluating the community](community.md) 8 | * [Evaluating development processes](processes.md) 9 | * [Evaluation models](evaluation_models.md) 10 | * [Evaluation dashboards](evaluation_dashboards.md) 11 | * [Quantitative evaluation with Grimoire](grimoire.md) 12 | * [Using Grimoire](grimoire_use.md) 13 | * [Acknowledgements and reuse of materials](acknowledgements_and_reuse_of_materials.md) 14 | * [To probe further](to_probe_further.md) 15 | * [Bibliography](bib.md) 16 | 17 | -------------------------------------------------------------------------------- /Worldwide_Time_Zones.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/Worldwide_Time_Zones.png -------------------------------------------------------------------------------- /acknowledgements_and_reuse_of_materials.md: -------------------------------------------------------------------------------- 1 | # Acknowledgements and reuse of materials 2 | 3 | Jesus thanks both Bitergia and Universidad Rey Juan Carlos, for the opportunity of participating in writing this book. 4 | 5 | The section "Aging" of the chapter "Evaluating the community" is based on a draft of the article "[Measure your open source community’s age to keep it healthy](http://radar.oreilly.com/2014/10/measure-your-open-source-communitys-age-to-keep-it-healthy.html)", publised in O'Reilly Radar. 6 | -------------------------------------------------------------------------------- /acs-mailman-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/acs-mailman-example.png -------------------------------------------------------------------------------- /activity-crs-efficiency-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/activity-crs-efficiency-openstack.png -------------------------------------------------------------------------------- /activity-openstack-openhub.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/activity-openstack-openhub.png -------------------------------------------------------------------------------- /activity-openstack-stackalytics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/activity-openstack-stackalytics.png -------------------------------------------------------------------------------- /activity-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/activity-openstack.png -------------------------------------------------------------------------------- /activity-puppet-commits.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/activity-puppet-commits.png -------------------------------------------------------------------------------- /activity-scm-authors-puppet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/activity-scm-authors-puppet.png -------------------------------------------------------------------------------- /aging-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/aging-openstack.png -------------------------------------------------------------------------------- /bib.md: -------------------------------------------------------------------------------- 1 | # Bibliography 2 | 3 | Paul Anderson. Avoiding abandon-ware: getting to grips with the open development method ([HTML](http://oss-watch.ac.uk/resources/odm), visited on 2015-04-29). 4 | 5 | Victor R. Basili, Gianluigi Caldiera, H. Dieter Rombach. The Goal Question Metric ([PDF](http://www.cs.umd.edu/~mvz/handouts/gqm.pdf), visited on 2015-04-26). 6 | 7 | Barend Jonkers, Cor Nouws. Comparing LibreOffice and Apache OpenOffice ([PDF](http://www.nouenoff.nl/downloads/LibreOffice_AOO_CompetitiveFeatureMatrix_20150318.pdf), visited on 2015-06-08; [blog post linking to it](http://ostatic.com/blog/apache-openoffice-versus-libreoffice), visited on 2015-06-08). 8 | 9 | Klaas-Jan Stol, Muhammad Ali Babar. 10 | A comparison framework for open source software evaluation methods. Open Source Software: New Horizons, 389-394. ([PDF](http://ulir.ul.ie/bitstream/handle/10344/748/2010-Stol-A-Comparison.pdf), visited on 2015-06-24). 11 | 12 | Carbon, R., Ciolkowski, M., Heidrich, J., John, I., and Muthig, D. Evaluating Open Source Software through Prototyping, in St.Amant, K., and Still, B. (Eds.) Handbook of Research on Open Source Software Technological, Economic, and Social Perspectives (Information Science Reference, 2007), pp. 269-281. 13 | 14 | Assessment of the degree of maturity of Open Source open source software, ([PDF](http//www.oitos.it/opencms/opencms/oitos/Valutazione_di_prodotti/Modello1.2.pdf).) 15 | 16 | Lavazza, L. Beyond Total Cost of Ownership Applying Balanced Scorecards to OpenSource Software. Proc. International Conference on Software Engineering Advances (ICSEA) Cap Esterel, French Riviera, France, 2007, pp. 74-74. 17 | 18 | Cruz, D., Wieland, T., and Ziegler, A. Evaluation criteria for free/open source software products based on project analysis, Software Process Improvement and Practice, 2006, 11(2). 19 | 20 | Lee, Y.M., Kim, J.B., Choi, I.W., and Rhew, S.Y. A Study on Selection Process of Open Source Software. Proc. Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT), Luoyang, Henan, China, 2007. 21 | 22 | del Bianco, V., Lavazza, L., Morasca, S., and Taibi, D. Quality of Open Source Software The QualiPSo Trustworthiness Model. Proc. Fifth IFIP WG 2.13 International Conference on Open Source Systems (OSS 2009), Skövde, Sweden, June 3-6, 2009. 23 | 24 | Golden, B. Succeeding with Open Source (Addison-Wesley, 2004). 25 | 26 | Wasserman, A.I., Pal, M., and Chan, C. The Business Readiness Rating a Framework for Evaluating Open Source, 2006, Technical Report. 27 | 28 | www.openbrr.org Business Readiness Rating for Open Source, RFC 1, 2005. 29 | 30 | Majchrowski, A., and Deprez, J. An operational approach for selecting open source components in a software development project. Proc. 15th European Conference, Software Process Improvement (EuroSPI), Dublin, Ireland, September 3-5, 2008. 31 | 32 | Polančič, G., and Horvat, R.V. A Model for Comparative Assessment Of Open Source Products. Proc. The 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, USA, 2004. 33 | 34 | del Bianco, V., Lavazza, L., Morasca, S., and Taibi, D. The observed characteristics and relevant factors used for assessing the trustworthiness of OSS products and artefacts, 2008, Technical Report no. A5.D1.5.3. 35 | 36 | Ciolkowski, M., and Soto, M. Towards a Comprehensive Approach for Assessing Open Source Projects Software Process and Product Measurement (Springer-Verlag, 2008). 37 | 38 | Taibi, D., Lavazza, L., and Morasca, S. OpenBQR a framework for the assessment of OSS. Proc. Third IFIP WG 2.13 International Conference on Open Source Systems (OSS 2007), Limerick, Ireland, 2007, pp. 173-186. 39 | 40 | Cabano, M., Monti, C., and Piancastelli, G. Context-Dependent Evaluation Methodology for Open Source Software. Proc. Third IFIP WG 2.13 International Conference on Open Source Systems (OSS 2007), Limerick, Ireland, 2007, pp. 301-306. 41 | 42 | Sung, W.J., Kim, J.H., and Rhew, S.Y. A Quality Model for Open Source Software Selection. Proc. Sixth International Conference on Advanced Language Processing and Web Information Technology, Luoyang, Henan, China, 2007, pp. 515-519. 43 | 44 | Ardagna, C.A., Damiani, E., and Frati, F. FOCSE An OWA-based Evaluation Framework for OS Adoption in Critical Environments. Proc. Third IFIP WG 2.13 International Conference on Open Source Systems, Limerick, Ireland, 2007, pp. 3-16. 45 | 46 | Atos Origin Method for Qualification and Selection of Open Source software (QSOS) version 1.6, 2006, Technical Report. 47 | 48 | Petrinja, E., Nambakam, R., and Sillitti, A. Introducing the OpenSource Maturity Model. Proc. ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS 09), Vancouver, Canada, 2009, pp. 37-41. 49 | 50 | Woods, D., and Guliani, G. Open Source for the Enterprise Managing Risks Reaping Rewards (OReilly Media, Inc., 2005). 51 | 52 | Duijnhouwer, F., and Widdows, C. Open Source Maturity Model, Capgemini Expert Letter, 2003. 53 | 54 | Koponen, T., and Hotti, V. Evaluation framework for open source software. Proc. Software Engineering and Practice (SERP), Las Vegas, Nevada, USA, June 21-24, 2004. 55 | 56 | Samoladas, I., Gousios, G., Spinellis, D., and Stamelos, I. The SQO-OSS Quality Model Measurement Based Open Source Software Evaluation. Proc. Fourth IFIP WG 2.13 International Conference on Open Source Systems (OSS 2008), Milano, Italy, 2008. 57 | 58 | Polančič, G., Horvat, R.V., and Rozman, T. Comparative assessment of open source software using easy accessible data. Proc. 26th International Conference on Information Technology Interfaces, Cavtat, Croatia, June 7-10, 2004, pp. 673-678. 59 | 60 | Udas, K., and Feldstein, M. Apples to Apples: Guidelines for Comparative Evaluation of Proprietary and Open Educational Technology Systems. May 2006. SUNY Learning Network at the State University of New York, USA. ([PDF](http://vcampus.uom.ac.mu/vcilt/resources/ApplestoApples.pdf), visited on 2015-07-15). 61 | -------------------------------------------------------------------------------- /bmiOpenStackSoftware.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/bmiOpenStackSoftware.jpg -------------------------------------------------------------------------------- /bugzilla-lifecycle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/bugzilla-lifecycle.png -------------------------------------------------------------------------------- /community.md: -------------------------------------------------------------------------------- 1 | # Evaluation of FOSS communities 2 | 3 | For many FOSS projects, the communities supporting them are the main responsibles for the evolution of the project. Evaluating the communities is therefore fundamental to evaluate the project. 4 | 5 | ## Different scopes: developers, contributors, users... 6 | 7 | FOSS communities are diverse, and may include many different actors. But in general, attending to the scope, the following, usually overlapping, communities can be defined: 8 | 9 | * The development community, composed by people in charge of developing and maintaining the software produced by the company. 10 | * The contributing community, composed by all the people actively contributing, not only with code. Examples of contributors are: submitters of bug reports, participants in discussions in mailing lists, translators, writers of documentation, etc. The contributing community includes developers too. 11 | * The user community, composed by users of the software participating somehow in the community. This can be by asking questions, by attending events, by joining social network groups with interest in the project. etc. Usually the user community includes the development and contributing communities, since they are also users of the system. 12 | * The ecosystemn community, composed by all stakeholders not only in the project itself, but in all the ecosystem of projects related to it. The user community is a part of the ecosystem community. 13 | 14 | We define these communities to hightlight different populations that may be relevant for an evaluation. It is important to realize that their borders are fuzzy, and people move from one to another as time passes. 15 | 16 | For each of these scopes, different evaluation means can be used. Despite the apparent diversity, we can also identify some techniques and parameters that are useful for all of them. The rest of this chapter will enter into the peculiarities of each community, and will show as well what they have in common and the techniques that are usful for evaluating all of them. 17 | 18 | ### Development community 19 | 20 | The development community is composed of the persons developing and maintaining the products produced by the project: software and related artifacts, such as documentation. 21 | 22 | In the case of FOSS projects with open development models based on coordination tools, there is a lot of information available about them. Usually, data can be collected from the following repositories: 23 | 24 | * Source code management repository. Almost all the information in is produced by the development community, since they are mainly changes to source code. In fact, one of the ways of defining the development communuity is as "those people who have contributed at least one change". 25 | * Code review system. All reviewers can be considered as a part of the development communities. Most of the submitters of change proposals are also developers, or they are acquiring that status. 26 | * Issue tracking system. Developers participate in ITS by opening, commenting and closing tickets. They are not the only ones opening or commenting, but usually only they can fix issues, and close tickets. 27 | * Asynchronous and synchronous communication. It is very usual to have separate channels for developers, which allow for a separate tracking of this community. 28 | 29 | In summary, most of the data in those SCM, CRS and ITS repositories are related to development activities, and developers usually have separate channels in ACS and SCS. Therefore, the evaluation of the development community in open projects, where all these repositories are public, can be very detailed. 30 | 31 | 32 | ### Contributing community 33 | 34 | The contributing community is a bit in between the development community and the user community. Contributors are usually users that are in the road to become developers. But they may or may not walk that path towards the development community. This makes it difficult to specifically track contributors who are not developers. 35 | 36 | * Issue tracking system. Tickets are opened by contributors, be them developers or not. Since we consider both feature requests and bug reports as valuable contributions to the project, everything happening in the ITS is performed by the contributing community. However, the "responisve" part of the action is carried on by developers. 37 | * Asynchronous and synchronous communication. Contributors may join development channels. But being they users as well, contributors are also present in user channels. This makes it difficult to track their activity, except that they can be identified in ITS, and their identity linked to ACS and SCS. 38 | 39 | Since the difference between "contributor" and "developer" is a specially fuzzy one, in many cases both communities are considered as one. However, in some sense contributors are the pool where developers come from. People usually become developer after contributing to the project for a while. Therefore, for estimating the future of the development community, and the engagement of people who could become developers, studying the contributor community is specially interesting. 40 | 41 | ### Users community 42 | 43 | The community of users of a FOSS projects is much more difficult to evaluate than those of contributors and developers. In fact, even estimating the number of users is usually difficult. Usage is in most cases passive, in the sense that almost no interaction with the project is needed to become user. In most cases of non-FOSS software, to become user implies purchasing a license, which is an action that can be tracked. But in the case of FOSS software, it is enough to get the software somehow, and start using it. No red tape is involved. 44 | 45 | Therefore, the source of information to estimate the size of the community of users are indirect: 46 | 47 | * Downloads. Many projects maintain a download area. When the primary usage of the product is via those areas, the number of downloads can be an estimator of the number of users. Of course, downloads of different releases have to be taken into account, and some model on when users reinstall with a newer version are needed. But this method can be enough to estimate trends and order-of-magnitude numbers. Of course, if most of the usage is not by direct download, the numbers are much less precise. This happens, for example, when the software is mainly available through FOSS distributions or via third party download areas. 48 | * Questions and comments in user forums. Given that the ratio of users to contributors is very large in most cases, it can be assumed that questions and comments about the product in third party forums are mainly by users. Therefore, the number of those questions and numbers can be a proxy for estimating the user population. Some models to convert those numbers into number of users are needed, but again trends over time can be somewhat accurate. 49 | * Presence in FOSS distributions. Some FOSS distributions maintain their own statistics about package (and therefore, product) installation. For example, in Debian the opt-in Popularity Contest maintains accurate stats of installed packages. From those numbers, and estimating the total population of Debian users, total usage in Debian can be estimated. From there, estimations for other distributions can be extrapolated. These numbers are probably not very accurate, but can provide an order-of-magnitude estimation. 50 | * Answers to polls and surveys. Polls and surveys to specific populations or to the population in general can also be a source of information. This is a general technique to know about user adoption, which compared to the others has the main drawback of its cost. Only the really popular FOSS products will appear in general surveys, but in some cases this is enough to have an idea. For example, the usage of Firefox or Chrome web browsers can be estimated this way. 51 | * Raw numbers in the Internet. Some services, such as Google trends, provide some information about how popular terms are in the web. In some cases, those numbers can be used to estimate trends in usage, assuming that the more popular a product, the more people appearences it will have in the global web. 52 | 53 | There are some specific cases when a more concrete estimation, or at least a lower watermark for an estimation can be established: 54 | 55 | * Software sending "beacons" to the project. This can be desktop or mobile software connecting to a certain location with a "Here I am" message, or a web product including components that are downloaed from a certain website. In both cases, since FOSS software can be changed, maybe there are versions of the product with the beacon removed. In addition, maybe there are products being used withouth Internet connection. But when these cases can be neglected, the estimation of usage can be very good. A very specific case is when the software, as a part of their normal functioning, identifies itself somehow. For example, web browsers send identification strings to web servers. These strings can be used to estimate usage. 56 | * Software which answers when queried. This a very specific, but very accurate case, when the software can be located and queried. The most well known case is the estimation of web servers, where the user base of Apache or nginx is tracked periodically by querying web servers all over the world for their identification string. 57 | * Software distributed through markets. When the product is distributed mainly through a market (a mobile or a distribution app market), usually it provides detailed numbers about installations, deinstallations, etc. 58 | 59 | These three cases are rare, but when they happen, estimations can be very accurate. The next pictures show some cases (web browsers, web servers) for which these methodologies can be used. That allows for the usage estimation of some FOSS products, such as Apache HTTP Server, nginx, Chrome or Firefox. 60 | 61 | ![StatCounter stats of browser usage](evaluation-usage-statcounter.png) 62 | ![W3Counter stats of browser usage](evaluation-usage-w3counter.png) 63 | *Example of usage estimation: [StatCounter top 9 browsers (May 2015)](http://gs.statcounter.com/#all-browser-ww-monthly-201505-201505-bar), top, and [W3Counter web browser market share](http://www.w3counter.com/globalstats.php) (May 2015), bottom. Both surveys are performed by using the identification information from web broswers in large collections of web sites. It is interesting noticing how they differ, even when they seem to use similar methodologies.* 64 | 65 | ![Netcraft stats of active servers](evaluation-usage-netcraft.png) 66 | *Example of usage estimation: [Netcraft web server survey (May 2015)](http://news.netcraft.com/archives/2015/05/19/may-2015-web-server-survey.html). This survey is performed by querying web servers (in this case, the top million busiest sites) for their identification string.* 67 | 68 | ### Ecosystem community 69 | 70 | The ecosystemm community is a kind of mega-community, including all the communities relevant to the project under evaluation. All the above comments for developer, contributing and user commities apply, since all of them are representer in this ecosystem community. But at the same time, there are more overlappings, since many developers, contributors or users may be in many of the communities in the ecosystem. 71 | 72 | The ecosystem community is difficult to study because it is usually large, and is spread through many different infrastructures. In fact, the first problem to address is to find out all the projects that form a part of it, since interrelations between components can be complex. However, this mega-community is very important for the long term sustainability of the project under evaluation. Usually, resources for most projects, including developers and users, come primarily from their ecosystem communities. Some of them can work as attractors, bringing new resources to the ecosystem community from the outside world. Of course, identifying those projects that create and nurture an ecosystem community is very important to understand long term trends in FOSS techologies. 73 | 74 | Just as an example, when studying the ecosystem for a GNOME application, all the GNOME ecosystem has to be taken into account, because it will be relevant for the future of the project. For example, if basic GNOME libraries stop evolving, it is difficult that the application keep pace with puture needs. 75 | 76 | From another point of view, the definition of the ecosystem, and therefore of the ecosystem community, is something that depend on the objectives of the evaluation. The ecosystem can be defined only for those projects with strong ties and great dependency, or for all those on which the project depends to some extent. The former case is the most usual, since dependencies and relationships are easy to perceive. But the latter can lead to important conclussions, such as when many projects discoverd that they were hit by the Heartbleed bug, deep in a software produced by a handful of developers, which was included in many, many very popular programs. 77 | 78 | ## Common techniques 79 | 80 | In addition to the analysis of the sources of information detailed in the previous chapter, there are some techniques that can be applied, mainly to find out about the non-developer communities. 81 | 82 | ### Surveys and interviews 83 | 84 | Surveys and interviews can help to obtain information that cannot be inferred from the project repositories. This can be because the needed information is not available in such repositories, or because the target population is not using them. 85 | 86 | An example of the first case is the effort devoted to develop software. It an evaluation parameter is the effor that a project puts into development and maintenance of the software, for example in aggregated person-months, that is not an information that can be reliablily extracted from repositories. But it can be obtanined if developers answer a simple survey. 87 | 88 | An example of the second one is user satisfaction. In some cases, a web-based survey where users rank a software with a five-stats schema may be enough. The main trouble here is to reach a sample of users representative enough of the user population for the survey to be statistically significant. 89 | 90 | Some projects do this kind of surveys on a regular basis, publishing their results. The main trouble with such surveys is that they are not comparable from project to project, which means that usually new surveys are needed, which is a time consuming and expensive procedure. It would be very coonvenient for evaluators that FOSS projects agreed on some common questions to developers and users, and standarized some surveys that could be used for the most common evaluation scenarios. 91 | 92 | Interviews to experts on a project are also a good source of data. Open or directed interviews provide mainly qualitative information about a community, which can enrich or complement quantitative information obtained by other means. 93 | 94 | ### Traces in collaboration systems 95 | 96 | All communication systems which can be analyzed can provide useful information for some kinds of community evaluation. It was already mentioned how ACS and SCS, as defined by the project, can be valuable sources of data. But other external communication channels can also help. For example, mentions in Twitter or other social networks, and even sentiment analysis on those mentions can say a lot about how the project and its community are perceived. 97 | 98 | ### Evaluation of documentation and third party studies 99 | 100 | Documentation can also provide details about the community. In some cases, the very availability of docuemtation, or certain kinds of documentation, is a relevant fact for users. In some others, user-generate documentation can also e a proxy for estimating some parameters, such as user involvement in the project. 101 | 102 | The documentation generated by the project itself in some cases describes some details about the community. For example, developer documentation usually describes the repositories and communication channels used by the project, which can be the source for empirical analysis of those repositories. Documentation can be useful as well for determining project policies with respect to participation, structure and decission making in the community. 103 | 104 | ## Evaluating activity 105 | 106 | One of the aspects of a community that are most usually evaluated is activity. In this context, evaluating activity refers to finding signs and traces of activity performed to make the project advance towards its goals. Activity can be of different kinds, such as: 107 | 108 | * committing patches to the source code management system 109 | * reporting, commenting or fixing bugs in the issue tracking system 110 | * submitting patches or reviewing them in the code review system 111 | * sending messages to mailing lists or synchronous communication systems 112 | 113 | Not all the activity is observable, and suitable for evaluation. For example, in mailing lists it is easy to know when a message is sent: it is enough to explore the list archive. But it is difficult to know who received that message (the list recipients is usually not public), and almost impossible to know who read it (reading is a private activity performed in your own mailer program). 114 | 115 | But the observable activity is usually good enough to know about the heartbeat of the project, about how many people is active in different roles, and about the general trends. 116 | 117 | There are several analysis of activity suitable for evaluation. The most common are: 118 | 119 | * Parameters reflecting activity for a certain period. For example, number of changes to the source code for the whole history of the project, or number of messages sent to mailing lists of the project during a certain week. 120 | * People active for a certain period. For example, people fixing bugs during a release period, or people providing code review advise during the last month. 121 | * Evolution of any of them. For example, new tickets per month for the whole live of the project, or messages sent to IRC channels per week. 122 | * Trends for any of them. For example, increase (or decrease) in number of messages posted in the forums for the project from December 2014 to January 2015. 123 | 124 | ![Activity in Puppet (commits per month) circa June 2015](activity-puppet-commits.png) 125 | *Number of commits per month for Puppet, as [shown by Grimoire Dashboard](http://bitergia.dev.puppetlabs.com), circa June 2015. Trends for the last year, month and week are shown as well.* 126 | 127 | Parameters by themselves only provide a first hint. Saying that a project is performing 2,303 commits in one month is a first indicator about how active is the project, but doesn't provide too much information. Putting it into context starts to make things more interesting. For example, comparing two projects with similar functionality, but one of them committing five times the other, is a first step towards comparing their activities. 128 | 129 | However, commit patterns may be very different from project to project, and a simple comparison may be misleading. For example, one of them may be committing a very single change proposal, just to improve them later. Another one, meanwhile, may be following an stringent code review process, committing only after several iterations that improved change proposals. The first pattern will produce much more commits than the second. The same can be said for other parameters. 130 | 131 | Comparisons within the same project are usually much more interesting and fair. If the project didn't change policies nor patterns during the last two months, comparing activity parameters will provide a good idea of trends. Comparisons over larger periods of time will allow for detecting the impact of changes in policies, tools or patterns. For excample, changes of the source code management system, or the introduction of code review, or policies on closing old tickets are reflected in the long-term charts about activity. And of course, growth, stagnation or decrease in activity can be clearly perceived over time. 132 | 133 | In addition to the raw parameters on activity, the parameters related to persons performing that activity are also relevant. They allow for a first characterization of the active community in several areas. An exponential growth in code authors, or a steady decline in bug fixers will certainly be interesting subjects of further analysis. 134 | 135 | Several of these parameters together show a multifaceted view of the project. As an example, next figures show a summary of activity of the same project, OpenStack, as shown in three different dashboards: Grimoire, Stackalitics, and Open Hub. 136 | 137 | ![Activity in OpenStack as shown by Grimoire Dashboard, circa June 2015](activity-openstack.png) 138 | *Activity in OpenStack: summary of activity in serveral repositories over time, as [shown by Grimoire Dashboard](http://activity.openstack.org), circa June 2015.* 139 | 140 | The Grimoire Dashboard shows activity in each kind of repository, which allows for easy comparison, while at the same time the general trends of activity in the project are visible. It shows some metrics about the people active in different roles. 141 | 142 | ![Activity in OpenStack as shown by Stackalytics, circa June 2015](activity-openstack-stackalytics.png) 143 | *Activity in OpenStack: summary of code merges over time, and split by company and module, as [shown by Stackalytics](http://stackalytics.com), circa June 2015.* 144 | 145 | Stackalytics focuses on changes merged, although it shows other activity as well. The summary includes activity by company and by module. 146 | 147 | ![Activity in OpenStack as shown by Open Hub, circa June 2015](activity-openstack-openhub.png) 148 | *Activity in OpenStack: summary of activity over time, as [shown by Open Hub](https://www.openhub.net/p/openstack), circa June 2015.* 149 | 150 | Open Hub shows a chart with the history of the activity, and some factoids about it, with a focus on activity in the source code management system. 151 | 152 | ### Activity in source code management 153 | 154 | Activity in source code management reflects how the project is producing changes for the products they build. Source code management stores "commits", each of them being a change (or "patch") to the source code. Each change is different in nature, size, complexity, etc., which makes it difficult to compare individual changes. However, when we consider large collections of changes, trends become apparent. 155 | 156 | In particular, when a project has pre-merge code review, this is a metric very difficult to cheat. If a developer tries to split a commit in several, for increasing the personal commit count, code reviewers would complain. Usually, the very possibility that this happens is enough for discouraging developers who could be tempted to split commits. 157 | 158 | However, it is important to realize that a single commit can be very important for the project, and be the result of a great effort. This is speciall the case when we're looking at the numbers of a specific persons, instead of aggregate numbers. Therefore, commit counts should not be used as a basis of rewarding systems, for example. 159 | 160 | When used properly, commit counts can be a good estimator for total effort. Recent studies show how above a certain number of commits per month, it is very likely that the developer works full time in the project. Numbers below that threshold can be prorated to estimate a fraction of full-time effort. This threshold is dependent on the project, but is usually around 10-15 commits per month, in complex systems with code review and continuous integration. 161 | 162 | ### Activity in code review and ticketing systems 163 | 164 | In CRS, activity is usually measured as the number of completed (or started) review processes per period. In the case of core reviews where different versions of the proposed change can be submitted, the total number of versions submitted for review is also significant. 165 | 166 | It is important to notice that when pre-merge code review is in place, the number of code review processes ending in a merge is equal to the number of commits. However, the metrics may show some differences depending on the dates considered. For reviews, either the date of the starting or finishing of the process may be considered. For commits, the date of authorship, or the date of commit (usually the date of the merge). 167 | 168 | The number of review processes are good indicators of the volume of the review process. Even when not all reviews require the same effort or have the same complexity, aggregated numbers tell a lot about the resources needed for the code review processes. 169 | 170 | The number of review processes ending in merge or abandon is also important. In fact, the difference between new code reviews, and merged plus abandoned code reviews for a period of time is an indicator of whether the project is coping with all proposed changes, or not. 171 | 172 | In the case of ITS, activity is measured either in terms of open tickets, or closed tickets. In fact, both are important indicators, and their difference shows whether the project is coping well with new tickets. The number of state changes, and the number of comments to tickets, are good indicators of activity as well. 173 | 174 | For both systems, efficiency in closing is an important factor. It is the ratio of new tickets or review processes to closed tickets or finished (merged or abandoned) review processes. When this number is larger than one, that means that the project is opening more than closing, which is a problem in the long term. If it is lower than one, the project is "recovering old workload", by closing more issues than they are being opened. 175 | 176 | ![Efficiency in code review (OpenStack, 2015 1Q)](activity-crs-efficiency-openstack.png) 177 | *Example of efficiency: [new versus closed review processes in OpenStack, 2015, first quarter](http://activity.openstack.org/dash/reports/2015-q1/pdf/2015-q1_OpenStack_report.pdf).* 178 | 179 | ### Activity in communication systems 180 | 181 | Activity in communication systems is usually measured in messages. But messages in different kinds of communication systems may be very different. For example, it is usually much longer to write an email message than to write a one line comment in an IRC channel. This means that metrics from one system cannot be compared with metrics from another one, even if they are similar. 182 | 183 | But nummbers for different points in time can be compared, whcih allows for detecting trends, and even estimate the amount of effort needed to track all communication channels for a project. Since core developers may need to track all of them, this is a first estimation of how the communication cost is hitting producitivity. 184 | 185 | ## Active persons 186 | 187 | In addition to activity itself, knowing who are the persons causing that activity is very important when evaluating a community. Active persons in a project can be measured in several repositories: 188 | 189 | * SCM: In modern SCM systems, such as git, authors and committers can be counted separately. Authors are persons authoring changes to source code. Committers are persons commiting those changes to the repository. Authors can be also committers, if they have commit rights. But if they don't, usually they send their changes to committers, who are those merging them into the repository. However, sites such as GitHub ot GitLab make things more complex, since when the changes are contributed via the web interface, as pull requests, authors are considered as committers even if they don't have commit access to the repository. 190 | 191 | ![Evolution of authors in Puppet](activity-scm-authors-puppet.png) 192 | *Example of active persons: Active authors per month in the Puppet project, circa June 2015.* 193 | 194 | * CRS: Code review systems allow for the identification of several populations of active persons: change proposers (initiators of review processes), reviewers, rejecters (reviewers rejecting changes, asking for new versions), accepters (reviewers accepting changes), and abandoners (submitters abandoning a proposed change). 195 | 196 | ![Evolution of change submitters in Wikimedia](crs-activity-submitters-wikimedia.png) 197 | *Example of active change submitters: Active summitters of proposed changes for code review, per month, in the Wikimedia projects, circa June 2015.* 198 | 199 | * ITS: The main active populations to track are ticket openers and closers, and people participating by changing states or commenting. Openers are persons contributing with new bug reports or feature requests. In many cases they are not developers, but people hit by a bug, or needed a new feature, and engaged enough with the project to devote some time to file the new ticket. People changing stated and closing tickets are very likely developers in the project. People commenting are either developers, or non-developers (maybe the one who submitted the ticket) collaborating in the bug-fixing or feature-implementation processes. 200 | 201 | ![Evolution of ticket closers](its-activity-closers-cloudstack.png) 202 | *Example of active ticket closers: Active people closing tickets, per month, in the CloudStack project, circa June 2015.* 203 | 204 | * Communication channels: In most communcation channels, be them synchronous or asynchronous, active persons that can be measured are senders (or posters). In most of these systems the information about who is actually reading, or even receiving, those messages sent is not easy to obtain, or does not exist. In the systems that allow for it, the number of persons actively answering or following-up to a message is interesting as well. 205 | 206 | ![Evolution of senders in IRC](mls-activity-senders-openstack.png) 207 | *Example of active senders in IRC channels: Active people sending messages in IRC channels, per week, in the OpenStack project, circa June 2015.* 208 | 209 | Of course, in addition to the raw numbers of active persons, the ratio of any parameter showing activity to the numbers of the group causing that activity is specially relevant. For example, the ratio of commits to authors of those commits over time shows quickly if the number of commits per author are growing or not. 210 | 211 | ## Merging identities 212 | 213 | For most of the studies based on tracking persons, it is important to merge all identities that a single person may have in the repositories in a single merged identity. That can be done at four levels: 214 | 215 | * The repository level. That consists of merging all the identities of the same person in a given repository. For example, meging all your identities in a certain git repository. This is useful, for example, to count the real number of people working in that repository. 216 | * The repository kind level. In this case, all the identities of the same person, across all the repositories of the same kind for a certain project, will be merged. This is useful for studies for all repositories of the same kind. For example, for counting the total number of developers contributing to source code, and thus to git repositories, of a certain project. 217 | * The project level. In this case, all identities of the same person, across all the repositories of any kind of the same project will be merged. This is needed to know about persons at the project level, such as for evaluating the population of contributors of a project across all its repositories of any kind. 218 | * The global level. All identities for a certain person, in any repository of any analyzed project, is merged into a single merged identity. This is useful when tracking people working in several projects. For example, for finding developers working both in project X and project Y. 219 | 220 | In some cases, the projects keep some information to track the multiple identities of developers. But in most cases, you can only relay on heuristics and in manual comparison and merging of identities. There are many heuristics that can be used, but they can be tricky depending on the circumstances, over- or underperforming in specific projects. One example is comparison of email addresses when the complete name string matches. To illustrate this heuristic, let's use the following email addresses: 221 | 222 | ``` 223 | Jesus M. Gonzalez-Barahona 224 | Jesus M. Gonzalez-Barahona 225 | ``` 226 | 227 | A heuristics finding exact matches in names would correctly merge these two indentities. But now consider the same heuristics applied to these two addesses: 228 | 229 | ``` 230 | John Smith 231 | John Smith 232 | ``` 233 | 234 | Given that John Smith is a very common name, it could perfectly be the case, specially in a large community, that those identities correspond to different persons, and therefore shouldn't be merged. 235 | 236 | In general, this happens with any heuristics you may find out. That is the reason why usually the merging of identities is really a mix of applying heuristics and manual check of the identities. Of course, when the project itself is involved, and the real persons whose identities are merged collaborate, the process can be reviewed by them. This is the better way of ensuring accuracy. 237 | 238 | As was commented at the beginning of this section, the most accurate this merging process is, the better estimation of parameters that depend on identfying persons, and not identities. 239 | 240 | ## Aging 241 | 242 | Of the many aspects to explore in the community of a FOSS project, turnover and age structure are some of the more important. Turnover shows how people are entering and leaving the community. It tells how attractive is the community, and how it retains people once they join. Age structure, understanding age as "time in the project" shows for how long members have joined it. It tells how many people are available in different stages of experience, from old-timers to newbies. Together, both can be used to estimate engagement, to predict the future structure and size of the community, and to detect early potential problems that could prevent a healthy growth. 243 | 244 | ### The community aging chart 245 | 246 | Both turnover and age structure can be estimated from data in software development repositories. A single chart can be used to visualize turnover and age structure data obtained from these repositories: the community aging chart. This chart resembles to some extent the [population pyramid](http://en.wikipedia.org/wiki/Population_pyramid) used to learn about the age of populations. It represents the "age" of developers in the project, in a way that provides insight on its structure. 247 | 248 | ![Aging charts for the OpenStack community](aging-openstack.png) 249 | *Example of aging charts: [Community aging charts](http://activity.openstack.org/dash/browser/demographics.html) for authors of code, as found in git repositories of the OpenStack project (left) and ticket participants, as found in the OpenStack Launchpad (right), circa June 2015. Each pair of blue and green bars corresponds to a generation of six months.* 250 | 251 | In the aging chart, each pair of two horizontal bars shows how a "generation" is behaving. The Y axis represents how old is each generations, with younger ones at the bottom. For each generation, the green bar (attraction) represents the number of people that joined it. In other words, how many people were attracted to the community during the corresponding period (say, first semester of 2010). Meanwhile, the blue bar (retention) represents how many people in that generation are still active in the community. In other words, how many of those that were attracted are still retained. 252 | 253 | ### One chart, many views 254 | 255 | The aging chart shows many different aspects of the community. Let's review some of them. 256 | 257 | The ratio of the pair of bars for each generation is its retention ratio. For the newest generation, it is 100%, since people recently entering the community are still considered to be active (but that depends on the inactivity period, see below). A ratio of 50% means that half the people in the generation are still retained. Comparing the length of each pair of bars, we can quickly learn about which generations were most successfully retained, and which ones mostly abandoned the project. 258 | 259 | The evolution of green bars tells us about the evolution of attraction over time. Most successful projects start with low attraction, but at some point they starts to become very attractive, and the bars grow very quickly. When a project enters maturity, usually its attraction becomes more stable, and can even start to decline, with the project being still extremely successful, just because it is no longer "sexy enough" for potential newbies. 260 | 261 | The evolution of blue bars tells us about the current age structure of the community. If bars in the top are large, but those in the bottom are small, the community is retaining early generations very well, but having difficulties with retain new blood. On the contrary, if bars in the top are small while those in the bottom are large, newcomers are staying, while experienced people already left. Blue bars can only be as large as green bars (you cannot retain more people, for a certain generation, that those that you attracted to it). Therefore, "large" and "small" for blue bars is always relative to green bars. 262 | 263 | ### Different charts for different information 264 | 265 | To build the community aging chart, three parameters have to be considered: generation period, inactivity period, and snapshot date: 266 | 267 | * The generation period defines how long generations are: that is the granularity of the chart. It is usually one year, or maybe six months for younger communities. People in the community is going to be charted according to their generation, using this granularity. 268 | * The inactivity period is how long we wait before considering that somebody left the community. We don't know if persons really left the community: maybe they are on vacation, or on a medical leave. So, we have to estimate that "if somebody was not active during the last m months, we consider that person as a departure from the community". That m is the inactivity period, which is usually equal to the generation period, but could be different. 269 | * The snapshot date is when we consider as "now". That is, we can calculate the the community aging chart for today, but also for any time in the past. In fact, comparisons of charts for different snapshot dates say a lot about the evolution of the attraction and retention of the project over time. 270 | 271 | Comparing a community aging chart from the past with the current one let us compare the potential we had some time ago with the reality now. In most development communities, people inactive for a long period are very unlikely to show up again. That means that the sum of the retention bars in the chart snapshoted two years ago are the maximum population that the community is going to have two years later, save the generations entering during these two years. 272 | 273 | ## One example and some comments 274 | 275 | As an example, we can compare the aging chart for OpenStack in July 2014 with the same chart for July 2013. Both charts show six-month generations and use six month inactivity period as well. Obviously enough, the latter includes two bars less, those corresponding to the two last generations, who still had not joined the project in July 2013. Green bars corresponding to generations more than one year old in July 2014 are exactly the same as those in the chart for July 2014, only shifted by one year. If a generation attracted a number of people, that does not depend on when we set the snapshot. 276 | 277 | ![Aging chart for OpenStack authors, circa July 2013](openstack-scm-aging-2013-07.png) 278 | *Aging chart for authors in OpenStack git repositories in July 2013* 279 | 280 | ![Aging chart for OpenStack authors, circa July 2014](openstack-scm-aging-2014-07.png) 281 | *Aging chart for authors in OpenStack git repositories in July 2014* 282 | 283 | If we focus now on the one-year-old generation for July 2013 (the third one, counting bottom-up), we can see how it is represented one year later. From a total of about 190 persons attracted, about 100 were still retained in July 2013. That means that in July 2014 we could expect at most 100 persons still retained in that generation. Now, fast forward to the future: in the chart for July 2014, about 70 persons are still retained from the (now) two-years-old generation. In other words, the project lost a much higher share of the generation during the first year than during the second one, even if we consider the latter case relative to those that still were in the project in July 2013. 284 | 285 | This is a very common fact found in most projects: they lose a large fraction of attracted persons during the first year, but are more likely to retain them after that point. This depends as well on the policies of the project, and how you enter the community. Retention ratio for the first year usually reflects more than anything how difficult it is to enter the community. The more difficult it is to get in, usually the most engaged people are, and the less likely to leave quickly. But the more difficult it is to get in, the less people in the newer generation are going to be attracted. Therefore, projects with different entry barriers can attract very different quantities of people, but maybe the retained people after one year is very similar. Of course, volunteers and hired developers have different entry / leave patterns too, that influence these ratios. 286 | 287 | We can also read the future a bit. Assuming the current retention rates per generation, we could estimate the size of the retention bars for the future, and from it the total size of the community with a certain experience in the project. For example, all those staying more than two years in the project in one year from now, are in the blue bars corresponding to generations currently older than one year. This allows for the prediction of shortages of developers, or of experienced developers, for example. 288 | 289 | In fact, any policy oriented to improve attraction or retention of people can be easily tracked with these aging charts, by defining the ideal charts for the future, and then comparing with the actual ones. 290 | 291 | ## Time zones and other geographical information 292 | 293 | Knowing about the geographical location of the members of the community is difficult. In some projects, when people register in the project, they can enter some geographical information. That can be the country or city of residence, or even their coordinates. As an example, see below the map of Debian developers (well, in fact, the map of some of the Debian developers, who specified their location). 294 | 295 | ![Map showing location of Debian developers](location-debian-devel.png) 296 | *Example of geographical information for a community: [Map showing Debian developers location](https://www.debian.org/devel/developers.loc), for those developers who registered their coordinates.* 297 | 298 | But having this level of information is unusual, and usually incomplete, since it covers only community members who want to fill in this information. 299 | 300 | When the project records IP addresses accesing its infrastructure, they can do IP geo-location on them. Since different types of access can be tracked (access to the development repositories, to the downlowad area, to the forums, etc), those projects can track with detail the location of different actors in the community. But again, this is unusual. Most projects don't have these capabilities, or don't want to put this tracking in place. 301 | 302 | For projects willing to have some information about the geographic location of their community, but not using the former techniqueus, there is still something to be done: time zone analysis. 303 | 304 | The main advantage about time zone analysis is that it uses geographic information that individuals provide when using some repositories. Well, as is usual in these cases, it is not exactly individuals, but the software they use. The most two widespread cases are git and mailers: 305 | 306 | * git clients include the local date when commits are created. When those commits are acccepted, merged in other repositories, etc. the time (including the time zone tag) are not altered in most cases. Please, note that we said "in most cases": some actions on commits will alter their time, usually setting the time zone tag to that of the person performing that action. But still, the information is reliable enough to know about the time zones for commit authors. 307 | 308 | ![Git authors by timezone (OpenStack, 2014)](tz-scm-authors-2014.png) 309 | *Example of timezone analysis: Number of git authors per time zone, repositories for the OpenStack project, during 2014.* 310 | 311 | * Mailers include the local time, including time zone tags, in messages sent. In many cases, mailing list software keep this time. When that is the case, the analysis of mailing list repositories permit the identification of time zone for senders. 312 | 313 | ![Mail senders by timezone (Eclipse, 2013)](tz-scm-authors-2013.png) 314 | *Example of timezone analysis: Number of messages per time zone, sent to Eclipse mailing lists during 2014* 315 | 316 | In both cases, it is important to notice that there are at least three sources of trouble with this time zone analysis: 317 | 318 | * Bots that perform commits or send messages. They can have their local time zone set to whatever is convenient for the machine where they reside. Since in some projects bots can do a lot of these actions, the number of messages or commits per time zone can be greatly affected because of this. 319 | * People setting their time zone to something else than their time zone of residence. For example, frequent worldwide travellers, or persons with intense interactions with people in other timezones, may have their time zone set to UTC+0 (universal time, formerly Greenwich time). This means that the time zone corresponding to UTC+0 can be overrepresented because of this fact. 320 | * Many countries are in fact in two timezones, since they change time in Summer (Summer savings time). 321 | 322 | ![Map of world time zones](Worldwide_Time_Zones.png) 323 | *Map of world time zones. Original: [Worldwide Time Zones (including DST)](https://commons.wikimedia.org/wiki/File:Worldwide_Time_Zones_%28including_DST%29.png), by Phoenix B 1of3, Creative Commons Attribution-Share Alike 3.0 Unported* 324 | 325 | Due to the distribution of population on the Earth, timezone analysis provides a very high level glimpse of the geographical distribution of the community. There is no way of telling European from African contributors, for example, since they are in the same timezones. But you can roughly identify persons from several regions (the list is not exact, look at the map for details and a more accurate description): 326 | 327 | * UTC+12: New Zealand 328 | * UTC+10, UTC+11: Australia 329 | * UTC+9: Japan, Korea. 330 | * UTC+7, UTC+8: China, Eastern Russia, Indochina. 331 | * UTC+6: India (in fact, it is UTC+5:30). 332 | * UTC+3 to UTC+5: Western Russia, East Africa, Middle East. 333 | * UTC+0 to UTC+2: Western and Central Europe, West Africa. 334 | * UTC-2, UTC-3: Brazil, Argentina, Chile. 335 | * UTC-4 to UTC-6: North America Central and East Coast (US, Canada, Mexico), Central America, South America West Coast. 336 | * UTC-8, UTC-7: North America West Coast (US, Canada). 337 | 338 | For some uses, this split in regions is enough. For example, in the above chart about OpenStack git auhtors it is clear how most of the developers are from North America and Western Europe, with some participations of the Far East and other regions. But the distribution of the Eclipse mail senders is even more centered in Western Europe, with a large participation from North America, a only some presence from the rest of the world. 339 | 340 | This kind of study is enough to assess the results of policies for increasing geographical participation, or to know where developers come from to decide on a meeting location. 341 | 342 | ## Time of collaboration 343 | 344 | In global communities, knowing when people is working is important for many issues. For example, for deciding on coordination synchrounous distance meetings, for estimating to which extent the work on issues may be continuous because there are people working at any time, or to know about work timing patterns in the project. 345 | 346 | People work at different times of the day due to being in different time zones, but also due to different working habits. For example, many people working for companies tend to follow the office hours schedule, from 8 to 5 or similar. But volunteers tend to work on their spare time, that is in the evenings and during the night. The same can be said about day of the week: office workers tend to be active from Monday to Friday, while volunteers tend to work also during weekends. There are similar differences in vacation periods too. 347 | 348 | These work patterns can be estimated from dates in almost any of the repositories that we can use as data source, since activity is usually tagged with a time. This allows for very detailed analysis of when people perform that activity. 349 | 350 | ## Affiliation 351 | 352 | In FOSS communities, many develpers are not working as volunteers, but as paid workers. In this case, it may be important to know for which organization each of these developers is working. Knowing it allows for several kinds of higher level studies, such as diversity in organizations contributing to a project, or how each organziation collaborates with others. In general, any study that can be performed at the person level, can be performed at the organization level just by aggregating the activity of all their employees. 353 | 354 | The basis of these analysis is therefore identifying to which organziation is affiliated each developer, if any, and during which time period. In some cases, this information is maintained to some extent by the projects themselves. For example, the Eclipse Foundation and the OpenStack Foundation maintain detailed affiliation information for all their committers. But even in those cases, there are other people, such as casual posters to mailing lists, that cannot be identified in a coompulsory way, and who have little motivation to collaborate in any affiliation tracking schema maintained by the projects. 355 | 356 | There are other techniques that may work in a certain fraction of the cases to track affiliations. It is important to notice how this problem is related to the merging of identities, which was mentioned earlier in this chapter. Assumming that the merging of identities is already done and accurate, some of the techniques for finding affiliation information are: 357 | 358 | * Using domains in email addresses to identify companies. Not all domains are useful for this. For example, @gmail.com or @hotmail.com refer to the mailing system that the person is using, and has nothing to do with the organization for which they are working. But many other addresses, such as @redhat.com, @ibm.com or @hp.com can be easily tracked to Red Hat, IBM or HP. An specific case when this technique doesn't work is when there is a project policy or tradition of using project addresses. Thuius is the case, for example, of Apache, with @apache.org addresses, customarily used by Apache developers in activities related to the project. Obviously, those addresses have no use for fiinding affiliations. 359 | * Getting listitgs from involved companies. Companies contributing to a project may maintain such listings for their own interest. If they are willing to share them, they are a precious source of information to assign affiliation. An specific case of this is when the project itself tries to track affiliation for all contributors, as was commented earlier. 360 | * Internet searches. People can be usually found in social networks, web pages, etc. From the information found in those places, in many cases their affiliation (at least their current afiliation) can be inferred. 361 | 362 | Once affiliation information is available, any community study can be done by organziation. For example, the next chart shows the most active companies (by changes merged) in the OpenStack project. 363 | 364 | ![Top ten organizations in OpenStack, by number of commits (November 2014)](companies-openstack.png) 365 | *Example of organization analysis: Top ten organizations in OpenStack by number of commits and number of authors of commits for the whole history of the project up to November 2014.* 366 | 367 | ## Diversity 368 | 369 | In FOSS projects, diversity is important. Diversity ensures that dependencies on certain people, or on certain companies, are not a risk for the future of the project. Diversity ensures that control of the project is not in the hands of a few people. Diversity helps to have a healthy community, and lowers the barriers for new people to join the community. 370 | 371 | There are several metrics for diversity, some of them are: 372 | 373 | * [Bus factor](https://en.wikipedia.org/wiki/Bus_factor): Number of developers it would need to lose to destroy its institutional memory and halt its progress. 374 | * [Apache Pony Factor](https://ke4qqq.wordpress.com/2015/02/08/pony-factor-math/): Diversity of a project in terms of the division of labor among committers in a project. 375 | 376 | ### Bus factor 377 | 378 | The bus factor tracks the concentration of unique knowledge on the software in specific developers. The name "bus factor" comes from the extreme scenario "What would happen to the project if a bus hits certain developers?". The first formulation of this question is attributed to Michael McLay, who in 1994 [asked what would happen to Python](http://legacy.python.org/search/hypermail/python-1994q2/1040.html) if Guido van Rossum (its original author, and leader of the project) were hit by a bus. 379 | 380 | If the knowledge on the project is very concentrated on a small group of developers, the trouble for the project if those developers leave is very high. On the contrary, if the knowledge is evenly spread through all the developers, even if a large number of them leave, the project can survive the shock more easily. 381 | 382 | A maybe naive, but very practical simplification of the metric is to assume that the amount of code authored by developers is a good proxy for their knowledge on the system. Therefore, the distribution of the lines of code authored per developer in the current version of the software would allow for calculating the bus factor. 383 | 384 | A more complete view has into account other sources of information, such as bugs fixed in certain parts of the code, participation in design and decission making, etc. 385 | 386 | In any of those cases, some simple metrics that can be defined to get a number representing the bus factor are: 387 | 388 | * Minumum number of developers who have a certain fraction of the knowledge. For example, minumum number of developers authoring 50% of the current version of the software. 389 | * Maximum fraction of the knowledge that have a certain number of developers. For example, maximum fraction of source code authored by 10% of the developers. 390 | 391 | Even when these metrices are very simple, they capture a part of the "bus factor" idea. More complex metrics, having into account the complete distribution of knowledge (or source code authorship) across developers, are possible. Factoring these metrics by the time of activity in the project is also interesting (assuming that the longer the experience in the project, the larger the knowledge). 392 | 393 | ### Apache Pony Factor 394 | 395 | The pony factor was proposed by members of the Apache Software Foundation. The first detailed explanation about it was by Daniel Gruno. It is a number that shows the diversity of a project in terms of the diversity of labor among its developers. From this point of view it is a concrete implementation of the bus factor. 396 | 397 | The pony factor is defined as "the lowest number of developers whose total contribution constitutes the majority of the code base". A derived metric is the augmented pony factor, which takes into account if a developer who contributed to the code base is still active or not. In this case, contributions by inactive developers are ignored. 398 | 399 | The "contribution" considered for calculating the pony factor is the number of commits. Therefore, we can rephrase the definition of the augmented pony factor as: 400 | 401 | >"The augmented pony factor of a project is the lowest number of active developers whose total commit count is at least 50% of the total number of commits to the project". 402 | 403 | This single number tries to capture how many active developers have "half of the knowledge on the project". The larger this number, the more diverse it is, and the more people should be affected by the bus factor for the project to experience trouble. 404 | -------------------------------------------------------------------------------- /companies-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/companies-openstack.png -------------------------------------------------------------------------------- /crs-activity-submitters-wikimedia.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/crs-activity-submitters-wikimedia.png -------------------------------------------------------------------------------- /crs-github-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/crs-github-example.png -------------------------------------------------------------------------------- /crs-openstack-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/crs-openstack-example.png -------------------------------------------------------------------------------- /evaluation-models-data/stol.py: -------------------------------------------------------------------------------- 1 | input = open("stol.txt", 'r') 2 | 3 | dictionary = { 4 | 'Name': [], 5 | 'Year': [], 6 | 'Source': [], 7 | 'Orig.': [], 8 | 'Method': [], 9 | 'Reference': [] 10 | } 11 | 12 | while 1: 13 | line = input.readline() 14 | if not line: 15 | break 16 | if line[0] == "#": 17 | index = line[2:-1] 18 | elif line == "\n": 19 | pass 20 | else: 21 | dictionary[index].append(line[:-1]) 22 | 23 | 24 | # Handling references 25 | 26 | reference_dict = {} 27 | complete_reference_dict = {} 28 | 29 | for reference in dictionary['Reference']: 30 | # looking for first author 31 | ref_words = reference.split() 32 | first_author = ref_words[1][:-1] 33 | if first_author == "de": 34 | first_author = ref_words[1] + ref_words[2][:-1] 35 | # looking for first word 36 | ref = reference.split(':') 37 | first_word = ref[1].split()[0][1:] 38 | if len(first_word) < 4: 39 | first_word = ref[1][1:].split()[1] 40 | 41 | key = first_author + "-" + first_word 42 | key = key.lower() 43 | 44 | reference_dict[ref_words[0]] = '"' 45 | complete_reference_dict[ref_words[0]] = '"' + ' '.join(ref_words[1:]).replace("'", "").replace(":", "") 46 | 47 | 48 | # Printing table 49 | 50 | print "| Name | Year | Orig | Method | Source |" 51 | print "| -------- |:----:|:-----:|:------:| ------:|" 52 | 53 | for value in range(len(dictionary['Name'])): 54 | print '|', dictionary['Name'][value], '|', 55 | print dictionary['Year'][value], '|', 56 | print dictionary['Orig.'][value], '|', 57 | print dictionary['Method'][value], '|', 58 | if ',' in dictionary['Source'][value]: 59 | (value1, value2) = dictionary['Source'][value].split(',') 60 | print reference_dict[value1 + ']'], ',' 61 | print reference_dict['[' + value2], '|' 62 | else: 63 | print reference_dict[dictionary['Source'][value]], '|' 64 | 65 | 66 | print 67 | print 68 | print 69 | 70 | for key in complete_reference_dict: 71 | print complete_reference_dict[key] 72 | print 73 | -------------------------------------------------------------------------------- /evaluation-models-data/stol.txt: -------------------------------------------------------------------------------- 1 | # Name 2 | Capgemini Open Source Maturity Model 3 | Evaluation Framework for Open Source Software 4 | A Model for Comparative Assessment of Open Source Products 5 | Navica Open Source Maturity Model 6 | Woods and Guliani's OSMM 7 | Open Business Readiness Rating (OpenBRR) 8 | Atos Origin Method for Qualification and 9 | Selection of Open Source Software (QSOS) 10 | Evaluation Criteria for Free/Open Source Software Products 11 | A Quality Model for OSS Selection 12 | Selection Process of Open Source Software 13 | Observatory for Innovation and Technological transfer on Open Source software (OITOS) 14 | Framework for OS Critical Systems Evaluation (FOCSE) 15 | Balanced Scorecards for OSS 16 | Open Business Quality Rating (OpenBQR) 17 | Evaluating OSS through Prototyping 18 | A Comprehensive Approach for Assessing Open Source Projects 19 | Software Quality Observatory for Open Source Software (SQO-OSS) 20 | An operational approach for selecting open source components in a software development project 21 | QualiPSo trustworthiness model OpenSource Maturity Model (OMM) 22 | 23 | 24 | # Year 25 | 2003 26 | 2004 27 | 2004 28 | 2004 29 | 2005 30 | 2005 31 | 2006 32 | 2006 33 | 2007 34 | 2007 35 | 2007 36 | 2007 37 | 2007 38 | 2007 39 | 2007 40 | 2008 41 | 2008 42 | 2008 43 | 2008 44 | 2009 45 | 46 | 47 | # Source 48 | [9] 49 | [10] 50 | [11,12] 51 | [13] 52 | [14] 53 | [15,16] 54 | [17] 55 | [18] 56 | [19] 57 | [20] 58 | [21,22] 59 | [23] 60 | [24] 61 | [25] 62 | [26] 63 | [27] 64 | [28] 65 | [29] 66 | [30,31] 67 | [32] 68 | 69 | 70 | # Orig. 71 | I 72 | R 73 | R 74 | I 75 | I 76 | R/I 77 | I 78 | R 79 | R 80 | R 81 | R 82 | R 83 | R 84 | R 85 | R 86 | R 87 | R 88 | R 89 | R 90 | R 91 | 92 | 93 | # Method 94 | Yes 95 | No 96 | Yes 97 | Yes 98 | No 99 | Yes 100 | Yes 101 | No 102 | No 103 | Yes 104 | Yes 105 | No 106 | No 107 | Yes 108 | Yes 109 | No 110 | Yes 111 | No 112 | No 113 | No 114 | 115 | # Reference 116 | [9] Duijnhouwer, F., and Widdows, C.: 'Open Source Maturity Model', Capgemini Expert Letter, 2003. 117 | [10] Koponen, T., and Hotti, V.: 'Evaluation framework for open source software'. Proc. Software Engineering and Practice (SERP), Las Vegas, Nevada, USA, June 21-24, 2004. 118 | [11] Polančič, G., and Horvat, R.V.: 'A Model for Comparative Assessment Of Open Source Products'. Proc. The 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, USA, 2004. 119 | [12] Polančič, G., Horvat, R.V., and Rozman, T.: 'Comparative assessment of open source software using easy accessible data'. Proc. 26th International Conference on Information Technology Interfaces, Cavtat, Croatia, June 7-10, 2004, pp. 673-678. 120 | [13] Golden, B.: 'Succeeding with Open Source' (Addison-Wesley, 2004). 121 | [14] Woods, D., and Guliani, G.: 'Open Source for the Enterprise: Managing Risks Reaping Rewards' (O'Reilly Media, Inc., 2005). 122 | [15] www.openbrr.org: 'Business Readiness Rating for Open Source, RFC 1', 2005. 123 | [16] Wasserman, A.I., Pal, M., and Chan, C.: 'The Business Readiness Rating: a Framework for Evaluating Open Source', 2006, Technical Report. 124 | [17] Atos Origin: 'Method for Qualification and Selection of Open Source software (QSOS) version 1.6', 2006, Technical Report. 125 | [18] Cruz, D., Wieland, T., and Ziegler, A.: 'Evaluation criteria for free/open source software products based on project analysis', Software Process: Improvement and Practice, 2006, 11(2). 126 | [19] Sung, W.J., Kim, J.H., and Rhew, S.Y.: 'A Quality Model for Open Source Software Selection'. Proc. Sixth International Conference on Advanced Language Processing and Web Information Technology, Luoyang, Henan, China, 2007, pp. 515-519. 127 | [20] Lee, Y.M., Kim, J.B., Choi, I.W., and Rhew, S.Y.: 'A Study on Selection Process of Open Source Software'. Proc. Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT), Luoyang, Henan, China, 2007. 128 | [21] Cabano, M., Monti, C., and Piancastelli, G.: 'Context-Dependent Evaluation Methodology for Open Source Software'. Proc. Third IFIP WG 2.13 International Conference on Open Source Systems (OSS 2007), Limerick, Ireland, 2007, pp. 301-306. 129 | [22] Assessment of the degree of maturity of Open Source open source software, http://www.oitos.it/opencms/opencms/oitos/Valutazione_di_prodotti/Modello1.2.pdf. 130 | [23] Ardagna, C.A., Damiani, E., and Frati, F.: 'FOCSE: An OWA-based Evaluation Framework for OS Adoption in Critical Environments'. Proc. Third IFIP WG 2.13 International Conference on Open Source Systems, Limerick, Ireland, 2007, pp. 3-16. 131 | [24] Lavazza, L.: 'Beyond Total Cost of Ownership: Applying Balanced Scorecards to OpenSource Software'. Proc. International Conference on Software Engineering Advances (ICSEA) Cap Esterel, French Riviera, France, 2007, pp. 74-74. 132 | [25] Taibi, D., Lavazza, L., and Morasca, S.: 'OpenBQR: a framework for the assessment of OSS'. Proc. Third IFIP WG 2.13 International Conference on Open Source Systems (OSS 2007), Limerick, Ireland, 2007, pp. 173-186. 133 | [26] Carbon, R., Ciolkowski, M., Heidrich, J., John, I., and Muthig, D.: 'Evaluating Open Source Software through Prototyping', in St.Amant, K., and Still, B. (Eds.): 'Handbook of Research on Open Source Software: Technological, Economic, and Social Perspectives' (Information Science Reference, 2007), pp. 269-281. 134 | [27] Ciolkowski, M., and Soto, M.: 'Towards a Comprehensive Approach for Assessing Open Source Projects': 'Software Process and Product Measurement' (Springer-Verlag, 2008). 135 | [28] Samoladas, I., Gousios, G., Spinellis, D., and Stamelos, I.: 'The SQO-OSS Quality Model: Measurement Based Open Source Software Evaluation'. Proc. Fourth IFIP WG 2.13 International Conference on Open Source Systems (OSS 2008), Milano, Italy, 2008. 136 | [29] Majchrowski, A., and Deprez, J.: 'An operational approach for selecting open source components in a software development project'. Proc. 15th European Conference, Software Process Improvement (EuroSPI), Dublin, Ireland, September 3-5, 2008. 137 | [30] del Bianco, V., Lavazza, L., Morasca, S., and Taibi, D.: 'Quality of Open Source Software: The QualiPSo Trustworthiness Model'. Proc. Fifth IFIP WG 2.13 International Conference on Open Source Systems (OSS 2009), Skövde, Sweden, June 3-6, 2009. 138 | [31] del Bianco, V., Lavazza, L., Morasca, S., and Taibi, D.: 'The observed characteristics and relevant factors used for assessing the trustworthiness of OSS products and artefacts', 2008, Technical Report no. A5.D1.5.3. 139 | [32] Petrinja, E., Nambakam, R., and Sillitti, A.: 'Introducing the OpenSource Maturity Model'. Proc. ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development (FLOSS '09), Vancouver, Canada, 2009, pp. 37-41. 140 | -------------------------------------------------------------------------------- /evaluation-usage-netcraft.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/evaluation-usage-netcraft.png -------------------------------------------------------------------------------- /evaluation-usage-statcounter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/evaluation-usage-statcounter.png -------------------------------------------------------------------------------- /evaluation-usage-w3counter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/evaluation-usage-w3counter.png -------------------------------------------------------------------------------- /evaluation_dashboards.md: -------------------------------------------------------------------------------- 1 | # Evaluation dashboards 2 | 3 | There are several software products or services that provide dashboards useful for evaluation. Each of them have different characteristics, and are useful in different parts of the evaluation process, or for different areas of evaluation. In this chapter we introduce some of them. 4 | 5 | ## The Open Hub dashboard 6 | 7 | [Open HUB](http://openhub.net), formerly known as Ohloh, is a website maintained by BlackDuck. Among other services, it provides a software development dashboard for a very large collection of projects (at least, tens of thousands of them). 8 | 9 | All the information provided by Open HUB is based on the analysis of SCM. It includes both the analysis of the contents (licensing, lines of code, programming languages) and the metainformation (commits, committers, etc). 10 | 11 | ![Open HUB dashboard for Liferay Portal, main page](openhub-liferay.png) 12 | *Main page of the Open HUB dashboard for Liferay Portal, as of July 2015.* 13 | 14 | The dashboard offers several views, each showing a different aspect of the project. The main page is a summary of the parameters of the project, and in some cases provides enough parameters for a basic assessment about its activity, and the size of the code produced. Then, there are specific panels showing analysis by programming language, and by lines of code, activity and developers. 15 | 16 | ![Open HUB dashboard for Liferay Portal, commits panel](openhub-liferay-commits.png) 17 | *Commits panel of the Open HUB dashboard for Liferay Portal, as of July 2015.* 18 | 19 | The main interest of Open HUB is probably the huge number of projects for which they offer information. It is very like that if you look for any even minimally known project, Open HUB maintains a dashboard for it. 20 | 21 | The main problems with this dashboard lie on the lack of support for data sources other than the SCM, on its relative simplicity, and on the fact that being proprietary software, only they can improve and customize it. 22 | 23 | ## GitHub statistics 24 | 25 | GitHub provides some statistics for the repositories it hosts. Most of it is based on the activity in the git repository, although it provides some basic data about issues and pull requests as well. 26 | 27 | ![GitHub dashboard for OpenStack Nova, contributors panel](github-contributors.png) 28 | *Contributors panel of the GitHub dashboard for OpenStack Nova, as of July 2015* 29 | 30 | Even when they don't use the term "dashboard", they provide a simple one. It includes information on contriutors and activity, with a focus on the historical evolution of contributions. 31 | 32 | ![GitHub dashboard for OpenStack Nova, activity punchcard](github-punchcard.png) 33 | *[Activity punchcard for OpenStack Nova](https://github.com/openstack/nova/graphs/punch-card), shown by the GitHub dashboard, as of July 2015* 34 | 35 | The dashboards provided by GitHub are intersting because, even being simple, they are available for all the repositories the site hosts. And they host most of the FOSS being developed these days. Therefore, as in the case of Open HUB, it is very likely that if you need some simple metrics for almost any project, you can find a repository in GitHub with it, and therefore a dashboard providing it. 36 | 37 | On the other hand, some of the main drawbacks are that the metrics provided are just a few, and in general simple, that they don't provided aggregated metrics, for example at the level of a whole GitHub organization, and that they are proprietary software, meaning that only they can improve their dashboard. 38 | 39 | ## Stackalytics 40 | 41 | [Stackalytics](http://mirantis.com) is a service provided by Mirantis to the OpenStack community. It is based on FOSS, and for that reason it has been considered by other communities to gather and visualize their development metrics. 42 | 43 | ![Stackalytics dashboard, main view](stackalitics-main.png) 44 | *Main view of the Stackalytics dashboard for OpenStack, as of July 2015.* 45 | 46 | The Stackalytics dashboard shows a summary the activity and the community of the project, including the evolution of several parameters over time, and the current shares of contribution for companies, subprojects and developers. 47 | 48 | ![Stackalytics dashboard, view for Mirantis](stackalytics-mirantis.png) 49 | *View for the [activity of Mirantis, Stackalytics dashboard for OpenStack](http://stackalytics.com/?company=mirantis), as of July 2015.* 50 | 51 | Stackalytics is tailored to the specific needs of the OpenStack community. It provides information not only about commits in the git repositories, but also about tickets, code review and mail messages, making it a comprehensive tool that assists on the understanding of the OpenStack project. Being FOSS, the tool can be adapted to special needs, and is in fact being considered by other communities for providing a dashboard service to their developers. 52 | 53 | In its current form, its main drawback is related to its main feature: bein specifically tailored to the needs and characteristics of OpenStack, it can be difficult to adapt to other projects. 54 | 55 | 56 | ## Grimoire Dashboard 57 | 58 | Grimoire is a software system designed to retrieve information from software development repositories, store in a database, and then use that information for producing dashboards and other visualizations. Grimoire is FOSS, and can be adapted to many different needs, one of them being the grimoire dashboard. 59 | 60 | ![Grimoire dashboard for OpenStack](grimoire-openstack.png) 61 | *Main page of the [Grimoire dashboard for OpenStack](http://activity.openstack.org), as of July 2015* 62 | 63 | Grimoire is capable of extracting information from many different kinds of repositories related to software development. The dashboard uses the information retrieved from them in several panels, which show activity, community data, analysis of processes, diversity reports, and other kinds of studies. 64 | 65 | ![Grimoire dashboard, panel for Wikimedia tickets](grimoire-tickets.png) 66 | *Tickets panel for the [Grimoire Dashboard for the Wikimedia Foundation projects](http://korma.wmflabs.org), as of July 2015.* 67 | 68 | Grimoire is offered as a maintained service by Bitergia, but being FOSS it can also be deployed and customized by anyone. Currently one of its main drawbacks is that it is not easy to deploy, and that the capabilities for interacting with the data provided are limited. These problems are expected to be mitigated by the new generation of Grimoire, which is under development during Summer 2015. 69 | 70 | ![Grimoire dashboard (new generation) for OpenStack](grimoireng-openstack.png) 71 | *Main view of the new generation of the Grimoire dashboard for OpenStack, in a [preview of work in progress](https://projects.bitergia.com/previews/ng/), July 2015.* -------------------------------------------------------------------------------- /evaluation_models.md: -------------------------------------------------------------------------------- 1 | # Evaluation models 2 | 3 | This chapter describes some of the evaluation models that can be used for FOSS projects. We will focus on quantitative models, since they are easier to apply and replicate, and probably, more useful. This does not mean that qualitative models are not important. But since they are more based on qualitative perception, they are very dependent on the expertise and familiarity of the expert performing the evaluation. Quantitative models try to be more independent from the person doing the analysis, but defining quantitative data that tries to capture the relevant aspects of the project being evaluated. 4 | 5 | Of course, there are shadows in a continuum ranging from pure quantitative to pure qualitative. In fact, the models mentioned in this section may have some aspects which are at least partially qualitative. 6 | 7 | ## Basics of quantitative evaluation 8 | 9 | Quantitative evaluation is based on the identification of quantitative parameters that can be significant, and the definition of measurement models for them. 10 | 11 | Given the number of evaluation models that exist, [Stol and Ali Babar](#bib:stol-babar-comaprison-models) have proposed a comparison framework to evaluate them. In order to do so, the most relevant evaluation models were identified. The result of this identification process, after screening around 550 research papers, is provided in following table, with 20 approaches. The column "Orig" shows if the initiative is the result of a research (R) or an industrial (I) effort. Models have been classified as industrial if they are associated to at least one company. The column "Method" indicates the completeness of the methodology, meaning that if all required activities, tasks, inputs and outputs are outlined, the assessment methodology offers a complete guide to evaluation. If a mere set of evaluation criteria are proposed, then the authors labeled the methodology as not complete. 12 | 13 | | Name | Year | Orig | Method | 14 | | -------- |:----:|:-----:|:------:| ------:| 15 | | Capgemini Open Source Maturity Model | 2003 | I | Yes | 16 | | Evaluation Framework for Open Source Software | 2004 | R | No | 17 | | A Model for Comparative Assessment of Open Source Products | 2004 | R | Yes | 18 | | Navica Open Source Maturity Model | 2004 | I | Yes | 19 | | Woods and Guliani's OSMM | 2005 | I | No | 20 | | Open Business Readiness Rating (OpenBRR) | 2005 | R/I | Yes | 21 | | Atos Origin Method for Qualification and | 2006 | I | Yes | 22 | | Selection of Open Source Software (QSOS) | 2006 | R | No | 23 | | Evaluation Criteria for Free/Open Source Software Products | 2007 | R | No | 24 | | A Quality Model for OSS Selection | 2007 | R | Yes | 25 | | Selection Process of Open Source Software | 2007 | R | Yes | 26 | | Observatory for Innovation and Technological transfer on Open Source software (OITOS) | 2007 | R | No | 27 | | Framework for OS Critical Systems Evaluation (FOCSE) | 2007 | R | No | 28 | | Balanced Scorecards for OSS | 2007 | R | Yes | 29 | | Open Business Quality Rating (OpenBQR) | 2007 | R | Yes | 30 | | Evaluating OSS through Prototyping | 2008 | R | No | 31 | | A Comprehensive Approach for Assessing Open Source Projects | 2008 | R | Yes | 32 | | Software Quality Observatory for Open Source Software (SQO-OSS) | 2008 | R | No | 33 | | An operational approach for selecting open source components in a software development project | 2008 | R | No | 34 | | QualiPSo trustworthiness model OpenSource Maturity Model (OMM) | 2009 | R | No | 35 | 36 | Of all these models, we have selected some that we describe in some more detail in the following sections. 37 | 38 | ## OpenBRR 39 | 40 | The OpenBRR (Open Business Readiness Rating) is an evaluation method proposed in 2005 and sponsored most notably by Carnegie Mellon and some industrial partners (CodeZoo, SpikeSource and Intel) [The OpenBRR white paper](http://docencia.etsit.urjc.es/moodle/mod/resource/view.php?id=4343). The goal of this method is to provide an objective manner to assess community-driven projects, offering a final quantitative mark that is intended to provide a measure of its readiness to be deployed in a business environment. 41 | 42 | Following figure provides an overview of the how OpenBRR should be applied. As can be seen, OpenBRR involves a multi-step evaluation process, that can be adjusted by the evaluator to adapt the assessment to the specific needs of the organization that wants to deploy the 43 | software under study- 44 | 45 | ![The OpenBRR evaluation process](openbrr.jpg) 46 | 47 | OpenBRR is based on gathering metrics and factual data on up to following ten categories: 48 | 49 | * Functionality 50 | * Usability 51 | * Quality 52 | * Security 53 | * Performance 54 | * Scalability 55 | * Architecture 56 | * Support 57 | * Documentation 58 | * Adoption 59 | * Community 60 | * Professionalism 61 | 62 | For each category, a set of criteria and metrics are proposed. These inputs are then weighted and each of the above introduced categories are given a rating that ranges from 1 to 5. Then, depending on the final usage the software will be given, adopters may weight these categories, obtaining an overall rating of the project. Hence, not all categories are weighted equally, and for some scenarios a category may not be considered at all for the final rating (in that case, its weight factor would be 0%). 63 | 64 | To help in the assessment, OpenBRR offers a [spreadsheet template](http://docencia.etsit.urjc.es/moodle/mod/resource/view.php?id=4350) that can be used in the evaluation process. Many of the input data in this model are to be obtained by external tools or from the Internet. As an example, the quality category considers the following inputs: 65 | 66 | * Number of minor releases in past 12 months 67 | * Number of point/patch releases in past 12 months 68 | * Number of open bugs for the last 6 months 69 | * Number of bugs fixed in last 6 months (compared to # of bugs opened) 70 | * Number of P1/critical bugs opened 71 | * Average bug age for P1 in last 6 months 72 | 73 | These inputs are rated as well from 1 to 5, and the evaluator may then weight them in a posterior step. 74 | 75 | [Udas et al.](#bib:udas-apples) discuss in a report how to apply OpenBRR in real environments based on their experience in the evaluation of Learning Management Systems. The 31 page report is very exhaustive and provides some general guidelines to be followed when using OpenBRR. It also gives an idea of how difficult and time-consuming it is. 76 | 77 | The OpenBRR website provided a set of examples of use of the evaluation model. Of these, the most known assessed Moodle and Sakai, two well-known learning management systems that were widely used in industry and academic institutions. As they introduce the OpenBRR assessment process very well, we will show them here in detail. You can browse the [OpenBRR spreadsheet for Moodle](http://gsyc.es/~grex/evaluating/BRR_Worksheet_25Jul05_Moodle.sxc) and the [OpenBRR spreadsheet for Sakai](http://gsyc.es/~grex/evaluating/BRR_Worksheet_25Jul05_Sakai.sxc) for more details. 78 | 79 | The first step in the process is to select and weigh the criteria to be use in the evaluation process. In the case of Moodle and Sakai, the evaluators chose to use the following: 80 | 81 | | Rank | Category | Weight | 82 | | ----:|:------------- | -------:| 83 | | 1 | Functionality | 25% | 84 | | 2 | Usability | 20% | 85 | | 3 | Documentation | 15% | 86 | | 4 | Community | 12% | 87 | | 5 | Security | 10% | 88 | | 6 | Support | 10% | 89 | | 7 | Adoption | 8% | 90 | | Total | | 100% | 91 | 92 | leaving out five criteria: Quality, Performance, Scalability, Architecture, and Modularity. 93 | 94 | In the following step, each criteria is evaluated on its own. As an example, for the evaluation of the functionality, a list of 27 standard functionality items (that include from discussion forums to surveys or automatic testing) are included, which have been obtained from the edutools.info on-line portal. Depending on the grade of its completeness, each functionality is scored and weighted from 1 to 3 as shown in the following table. Additional 8 extra functionalities (such as LaTeX support or the inclusion of video) are rated in the same fashion. 95 | 96 | | Weight & Test Score Specification | Score | 97 | |:--------------------------------- | -----:| 98 | | Very important | 3 | 99 | | Somewhat important | 2 | 100 | | Not important | 1 | 101 | 102 | In order to obtain a total score for the functionality criteria, the total weights of the standard functionality items is summed up in W. Then the score for the assessed tool is obtained by adding all the scores, both from the standard and extended functionality, as T. Depending on the completeness of T related to W (in percentage), a final rating score is provided, using the cutoff values provided in following table. 103 | 104 | | Rating Score Table | Percentage Cutoff | Score | 105 | |:------------------ | -----------------:| -----:| 106 | | Unacceptable | 0% | 1 | 107 | | Poor | 65% | 2 | 108 | | Acceptable | 80% | 3 | 109 | | Good | 90% | 4 | 110 | | Excellent | 96% | 5 | 111 | 112 | In our case studies, Sakai obtains a 3 out of 5 (its percentage is 86.67%, as it has a total score of 52 out of a total weight of 60), while Moodle obtains 5 out of 5 (its percentage is 106.67% with a total score of 64 out of a a total weight of 60. 113 | 114 | Once this is done with all evaluation criteria, the score of each of the criteria is introduced in a spreadsheet and the final score is calculated. It should be noted that when doing so the 115 | previously defined weights are to be taken into consideration. For instance, the results of this step is provided in the following table for Moodle and Sakai. The total score of 4.19 for Moodle and of 3.23 for Sakai is finally obtained by summing up all the weighted scores for 116 | each of the categories. 117 | 118 | | Rank | Category | Moodle Unweighted | Sakai Unweighted | Weight | Moodle Weighted | Sakai Weighted | 119 | |:-- | --------:| -----------------:| ----------------:| ------:| ---------------:| --------:| 120 | | 1 | Functionality | 5 | 3 | 25% | 1.25 | 0.75 | 121 | | 2 | Usability | 4 | 4 | 20% | 0.8 | 0.8 | 122 | | 8 | Quality | 0 | 0 | 0% | 0 | 0 | 123 | | 5 | Security | 4.2 | 3.4 | 10% | 0.42 | 0.34 | 124 | | 9 | Performance | 0 | 0 | 0% | 0 | 0 | 125 | | 10 | Scalability | 0 | 0 | 0% | 0 | 0 | 126 | | 11 | Architecture | 0 | 0 | 0% | 0 | 0 | 127 | | 6 | Support | 4 | 1.5 | 10% | 0.4 | 0.15 | 128 | | 3 | Documentation | 3.1 | 3.1 | 15% | 0.47 | 0.47 | 129 | | 7 | Adoption | 4.4 | 4.2 | 8% | 0.35 | 0.34 | 130 | | 4 | Community | 4.2 | 3.2 | 12% | 0.5 | 0.38 | 131 | | 12 | Professionalism | 0 | 0 | 0% | 0 | 0 | 132 | 133 | 134 | Although OpenBRR is one of the most known assessment models, it has not achieved to create a thriving community and currently it seems to have come to a halt. 135 | 136 | ## QSOS 137 | 138 | QSOS (Qualification and Selection of Open Source software) is an assessment methodology proposed by ATOS Origin in 2004 and updated in 2013. It is composed of a formal method that describes a workflow to evaluate projects, a set of tools that help to apply the QSOS workflow and a community. The proposed process is shown in the figure below. It is divided in four iterative steps and is iterative in nature, meaning that it can be applied with different granularity levels, becoming more detailed in every iteration. 139 | 140 | ![The QSOS evaluation process](qsos.png) 141 | 142 | The first step is concerned with defining the evaluation criteria in three axes: type of software, type of license and type of community. The type of software axis is composed by two additional criteria: a maturity analysis and a functional coverage analysis. The next figure shows a diagram with the specific items that are to be considered when assessing the maturity of a project. These items can be obtained in general from any free software project. 143 | 144 | ![QSOS criteria to assess the maturity of a project](qsos-maturity.png) 145 | 146 | The second item for the type of software is related to the functionality of the project and depends on the software domain. 147 | 148 | The type of license criterion evaluates the software licenses for three aspects: if the license is a copyleft license, if copyleft is only bounded to the module level and if the license allows to add restrictions. 149 | 150 | Finally, the type of community criterion addresses the : 151 | 152 | * A single developer working on his own on the project 153 | * A group of developers, without formal processes 154 | * A developer organization with formalized and respected software life cycle, roles and a meritocratic structure 155 | * A legal entity (such as a foundation) that manages the community and acts as legal umbrella for the project 156 | * A commercial entity: a company that employs some of the core developers and tries to obtain revenues from the development of the project 157 | 158 | 159 | The second step involves the evaluation of the projects by obtaining data and measures from the project; raw evaluations are the output of this step. For each of the criteria, a score between 0 and 2 is given. The following table provides the scoring rule in the case of the assessment of functionality: 160 | 161 | | Score | Description | 162 | |:----- |:----------- | 163 | | 0 | Functionality not covered | 164 | | 1 | Functionality partially covered | 165 | | 2 | Functionality fully covered | 166 | 167 | The results of the second step are then weighted depending on the context and the requirements under which the software will be used; specifically this is done by setting weights and filters 168 | in advance. So while in the first step, for instance all functionality may be assessed independently of its importance for its adoption and deployment, in this step the degree of relevance of each functional aspect will be translated into a weighting value. In the case of functionality, this means that functionalities may be considered required, optional or not required. 169 | 170 | The final step is the selection of the most relevant software solution, by comparing the result obtained by several candidate software projects. QSOS offers two different modes of selection: strict and loose selection. Within the loose selection process, all software projects under assessment are evaluated for all criteria, and obtains a final rating. Within the strict selection process, as soon as a software does not comply with a relevant criteria of the evaluator, it is eliminated from the evaluation process. So, for instance, if a software does not require the required functionalities it is not further considered. It should be noted that with the strict selection procedure, and depending on the demands of the user, it may happen that no software meets the conditions. 171 | 172 | The final result of QSOS can be shown and compared graphically by several means. One of them is using a radar format as shown in the next figure. 173 | 174 | ![QSOS ](qsos-radar.png) 175 | 176 | The QSOS framework offers a set of tools that help users follow the assessment process. Among them, there is an editor (the Freemind well-known mind-mapping tool) the create evaluation templates. These templates can then be used for the evaluation of a project using a Firefox extension or a stand-alone application. QSOS offers a web backend service where templates and evaluations can be made public and shared. Finally, O3S is a web-based tool that allows to manipulate evaluations, perform comparisons and export them in various formats. 177 | 178 | ![The QSOS evaluation process](qsos-tools.png) 179 | 180 | Finally, the QSOS is an open project by itself, offering support to users and acting as a repository of templates and evaluations in several languages. 181 | 182 | 213 | 214 | -------------------------------------------------------------------------------- /factoids-openhub-git.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/factoids-openhub-git.png -------------------------------------------------------------------------------- /functional-evaluation-loaoo-features-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/functional-evaluation-loaoo-features-2.png -------------------------------------------------------------------------------- /functional-evaluation-loaoo-features.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/functional-evaluation-loaoo-features.png -------------------------------------------------------------------------------- /functional-evaluation-loaoo-model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/functional-evaluation-loaoo-model.png -------------------------------------------------------------------------------- /github-contributors.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/github-contributors.png -------------------------------------------------------------------------------- /github-punchcard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/github-punchcard.png -------------------------------------------------------------------------------- /grimoire-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/grimoire-openstack.png -------------------------------------------------------------------------------- /grimoire-tickets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/grimoire-tickets.png -------------------------------------------------------------------------------- /grimoire.md: -------------------------------------------------------------------------------- 1 | # Quantitative evaluation with Grimoire 2 | 3 | [This chapter is still to be written] 4 | 5 | ## Data retrieval 6 | 7 | MetricsGrimoire 8 | 9 | ## The database 10 | 11 | ## Analysis and visualization 12 | 13 | vizGrimoire 14 | 15 | ### Analysis library 16 | 17 | GrimoireLib 18 | 19 | ### Visualization library 20 | 21 | vizGrimoire JavaScript library. 22 | 23 | ### The Grimoire dashboard 24 | 25 | ## Automation of the whole process 26 | 27 | Automator 28 | 29 | ## Some details 30 | 31 | ### Unique identities 32 | 33 | ### Affiliations 34 | 35 | ### Project hierarchies 36 | -------------------------------------------------------------------------------- /grimoire_use.md: -------------------------------------------------------------------------------- 1 | # Using Grimoire 2 | 3 | [This chapter is still to be written] 4 | 5 | Example of complete cases of use of Grimoire to analyze real projects. 6 | 7 | ## Analysis of a git repository 8 | 9 | ## Analysis of a GitHub project 10 | 11 | ## Analusis of a medium project 12 | 13 | ## Analysis of a large project -------------------------------------------------------------------------------- /grimoireng-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/grimoireng-openstack.png -------------------------------------------------------------------------------- /introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | Free / open source software projects may be very different from each other. Evaluating them either to compare them, or just to understand better how to deal with them is currently a kind of black art. This book intends to compile some information that helps towards providing background, methodologies and procedures for converting it into predictable engineering. Time will tell if we succeeded in our effort. 4 | 5 | The book is structured in three main blocks: 6 | 7 | * The first includes the chapters "Before evaluating" and "Sources of information". It provides the basic background for understanding what the evaluation of free, open source software (FOSS) projects is; how it can be done; and what can be expected from it. 8 | * The second deals with the specifics of the different aspects of evaluating FOSS projects. It includes three chapters: "Evaluating the community", "Evaluating development processes", and "Evaluation models". 9 | * The third block is the practical one, presenting some tools for assisting in the evaluation. It also includes three chapters: "Evaluation dashboards", "Quantitative evaluation with Grimoire", and "Using Grimoire". The last two show how the Grimoire technology as an example of how tools can assist in evaluation -especially quantitative evaluation- of FOSS development projects. 10 | 11 | This book is a collaborative effort. The original authors like to see ourselves as the facilitators of a large endeavor: the evolution, improvement and maintenance of the book over time. This can be done only with the collaboration of readers, experts in the field, and in general, anyone interested in the book. To make it easier to contribute, we decided to explore the [GitBook](http://gitbook.com) technology, which is a [FOSS project itself](https://github.com/GitbookIO/gitbook). 12 | 13 | Therefore, this book is [available from GitBook](https://www.gitbook.com/book/jgbarah/evaluating-foss-projects), which mirrors it from a [GitHub repository](https://github.com/jgbarah/evaluating-foss-projects/). Different versions of the book (for reading online, for downloading in several formats) can be obtained from GitBook, while the "source code" (based on Markdown files) can be obtained in the GitHub repository. Contributions are accepted in several ways, the most convenient being using GitHub pull requests to that repository. You are encouraged to fork that repository and submit your pull requests suggesting improvements, if you want to collaborate. You can also use GitHub issues in the same repository to report any problem or make any suggestion. 14 | 15 | We, the authors, hope that you find this book useful, and that it contributes to make your life easier and happier. 16 | 17 | 18 | -------------------------------------------------------------------------------- /its-activity-closers-cloudstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/its-activity-closers-cloudstack.png -------------------------------------------------------------------------------- /its-bugzilla-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/its-bugzilla-example.png -------------------------------------------------------------------------------- /its-bugzilla-workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/its-bugzilla-workflow.png -------------------------------------------------------------------------------- /its-github-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/its-github-example.png -------------------------------------------------------------------------------- /kinds.md: -------------------------------------------------------------------------------- 1 | # Before evaluating 2 | 3 | In this chapter we deal with some preliminary issues, then describe how to evaluate specific aspects of FOSS products and projects. 4 | 5 | ## Steps in evaluation 6 | 7 | Most evaluation methods follow the following steps: 8 | 9 | * Conduct market research to decide which products or projects to evaluate. 10 | * Define the evaluation criteria. 11 | * Perform the actual evaluation, producing the evaluation results. 12 | 13 | You can perform the first two in any order. Both lead to the actual evaluation. In some cases, the process is iterative: 14 | 15 | * you select some subjects and evaluate them according to certain criteria 16 | * based on the results, you refine the criteria and redefine the list of subjects 17 | * you perform a new evaluation, which leads to more precise criteria and subjects, until you obtain enough data for a final decision 18 | 19 | ### Market research 20 | 21 | Before starting the actual evaluation, define the set of subjects to evaluate. This calls for extensive research on the potential subjects. This research is not only a matter of identifying all your options. It would desirable to find all the products or projects that could be the most appropriate, but this is usually impractical. A more cost-effective approach is to perform extensive market research, then follow some criteria to produce a short list for evaluation. 22 | 23 | The length of this list will depend on the resources available for the evaluation, the cost per subject of that evaluation, and the expected benefits. 24 | 25 | In some cases the subjects are already decided beforehand, and you can skip this step. This happens, for example, when a community decides to evaluate itself, or when a company wants to evaluate the products they rely upon to deliver services. 26 | 27 | ### Defining evaluation criteria 28 | 29 | The evaluation criteria will determine the information to be obtained after the evaluation process. It is important to tie those criteria to the evaluation's objectives. Make this relationship explicit. One way is to start out by clearly stating the objectives; map them to specific characteristics of the evaluation subject; and, finally, define a procedure to evaluate those characteristics. Evaluating some products in order to select the most mature one is different from choosing the product that has improved most over a long period. 30 | 31 | Mapping objectives to characteristics is not easy, because it is not always obvious which characteristics will have a greater impact on the objectives. For example, if the objective is to select a project whose community is easy to join, which characteristics would you evaluate? The length of time it takes for newcomers to reach the core development team, perhaps? The number of contributions already performed by newcomers? The details of formal policies encouraging newcomer participation? The friendliness of conversations in mailing lists and forums in which newcomers participate? 32 | 33 | To map objectives to characteristics, you need to use a model of how the latter led to the former. Models can range from informal, simple, andrule-of-thumb to formal, theory-backed, and empirically-tested ones. In any case, expertise on how characteristics of projects and products may influence the objectives is essential. 34 | 35 | Defining procedures for evaluating characteristics can also be tricky. You need to know what can be evaluated, and how you can apply that knowledge to the characteristics of interest. 36 | 37 | It's uncommon to find an exact evaluation of your characteristic of choice. For example, how can you characterize expertise in a development community? At least two dimensions are involved: the expertise of individuals, and how that expertise permeates the community so that newcomers can capitalize upon it. Both dimensions are difficult to evaluate. You usually need to relay on proxies, such as the average length of time developers remain in the community, and how experienced people collaborate with newcomers to solve issues and make decisions. 38 | 39 | For all these reasons, it's hard to start from scratch when evaluating. It is much better to find an evaluation model that fits our needs, and just map it to our specific objectives. A large part of this text will describe some existing models that you will hopefully find useful. 40 | 41 | In addition, it is convenient to explain how these models can be produced in systematic way. For that we will introduce the goal-question-metric (GQM) method in a [later section](#sec:gqm). It will serve both as an illustration of how models can be produced, and as a tool to produce new models. You can use GQM not only to derive your own models, if you have the expertise and resources, but also to adapt existing ones to your specific environment. 42 | 43 | ### Performing an evaluation 44 | 45 | Once the characteristics to evaluate are clear, and the methods to evaluate them too, you can start the actual evaluation. Depending on how you defined the evaluation methods, several actions may be performed, such as: 46 | 47 | * Surveys, for example to know about perceived quality by users, or about effort estimation of their own work by developers. 48 | * Interviews with experts, for example to know about how mature a certain product is considered. 49 | * Study of documentation, to learn about how a product interoperates with others, and to which extent that is described. 50 | * Analysis of source code, to characterize some quality parameters. 51 | * Analysis of messages in mailing lists, to characterize the flow of information in the project. 52 | * Study of the project bylaws, to determine how formal decisions are taken. 53 | 54 | These are just examples: many other actions are possible. But whatever the case, with the information gathered from those actions, the evaluation is performed. 55 | 56 | ### Evaluation results 57 | 58 | Depending on the objectives of the evaluation, one of three kinds of resulting information are provided in most evaluation models: 59 | 60 | * Tags. These are binary valued parameters that result of evaluation. For example, for a certain definition of "mature", a project may be defined as "mature" or "not mature". 61 | * Scales. Parameters with values that are numbers or elements from a finite set. For example, an scale can be defined as from 0 to 100, trying to show how close to "100%" is the value. For example, a parameter "closed-bugs" could be "78%", meaning that of all bugs reported during a certain period, 78% were closed. Or with a real or integer number. For example, "median of time to close" can be defined to be "178", meaning that the median of time to close a certain set of tickets is 178 hours. Or with values in a set of strings. For example, "maturity" could have values in the set "mature", "close to maturity", "inmature". 62 | * Metadata. This is usually detailed information about the parameter, from which usually either scales or tags can be produced. For example, detailed metadata for a parameter could be a list of its main statistics, or even a complete list of all its values. That way, from detailed metadata on "time to close" consisting on the time to close all tickets, the above mentioned scale "median of time to close" could be inferred. 63 | 64 | In addition, free text evaluation can be useful as well, such as the detailed analysis by an expert. 65 | 66 | A very specific case of free text evaluation are factoids. Factoids are predefined pieces of text that describe with natural language some quantitative situation. They are selected based on the results of the quantitative evaluation, but have the appearance of free text. To some extent, they can be better for novices, since they provide an explanation in "common words" of the tags, numbers or scales. 67 | 68 | ![Factoids shown by OpenHub for the git project](factoids-openhub-git.png) 69 | *Factoids shown by OpenHub for the git project, circa June 2015* 70 | 71 | We can classify the results of the evaluation process in two categories: 72 | 73 | * Quantitative evaluations. They produce a quantitative description of the evaluated characteristic. Tags, scales and metadata are cases of quantitative evaluation. 74 | * Qualitative evaluations. They produce a description of the quality of the evaluated characteristic. Free text evaluations produced by an expert are an example of qualitative evaluations. 75 | 76 | Qualitative evaluations can be converted into quantitative ones by using the descriptions to select from a scale of valuers. That allows for easier comparison, but usually some information is lost, the kind of shadows and details that qualitative descriptions provide. 77 | 78 | Quantitative evaluations can be converted in qualitative by producing the alreaready mentioned, predetermined "factoids". That allows for easier interpretation, but it is convenient to remember that those are "syntetized" qualities: the underlying information is quantitative, and the derived factoids are just descriptions of those quantities. 79 | 80 | NOTE: TODO. Example of both conversions 81 | 82 | ## Goal-question-metric 83 | 84 | 85 | The baseline rationale of the goal-question-matric GQM metric for deteriming the characteristics of subjects to evaluate, and how to evaluate them can be summarized as follows: 86 | 87 | > "The Goal Question Metric (GQM) approach is based upon the assumption that for an organization to measure in a purposeful way it must first specify the goals for 88 | itself and its projects, then it must trace those goals to the data that are intended to define those 89 | goals operationally, and finally provide a framework for interpreting the data with respect to the stated goals." 90 | 91 | > [The Goal Question Metric Approach](bib:basili-gqm). 92 | 93 | In other words, you first have to state the goals of the evaluation. Then, you have to map those goals to characteristics of the subject of evaluation, and how they are going to be evaluated. Finally, you have to find out how to interpret the results of the evaluation with respect to the intended objectives. 94 | 95 | Using the terms proposed by GQM, you have to define: 96 | 97 | * Goals, at the conceptual level. 98 | * Questions, at the operational level. 99 | * Metrics, at the quantitative level. 100 | 101 | ### Definition of goals 102 | 103 | ### Definition of questions 104 | 105 | ### Definition of metrics 106 | 107 | ## What is different in FOSS evaluation 108 | 109 | Evaluation of FOSS products is different for the following reasons: 110 | 111 | * The easy access to the product to evaluate. 112 | * The quantity and quality of available information 113 | * The importance of the community 114 | * The special case of open devlopment 115 | * The competing market for deep support 116 | 117 | ### Access to the product 118 | 119 | In the case of non-FOSS, the first barrier to evaluate is the access to the product. For FOSS, the evaluator is usually one download away from evaluating any FOSS product which is adequately packaged. For non-FOSS, just accessing may mean signing a contract, paying for a regular non-exclusive license, or obtaining a usually limited evaluation version. 120 | 121 | This means that with FOSS, evaluating the real thing promplty, to any detail, withouth strings of any kind attached is much more simple. 122 | 123 | ### Available information 124 | 125 | For FOSS products, not only the exectuable version of the software is availalble. Per definition, source code is available as well, which allows for its inspection, and the evaluation of aspects of quality that need access to it, such as code quality. 126 | 127 | For some non-FOSS, source code may be available, either for all potential users or for those with a certain negotiation power. But it is a rare event. 128 | 129 | In addition, if the development model is open, the development information for the FOSS product is kept avaible to anyone with the devloping community. Even when a single company drives the development of a FOSS product, they may decide to run all development in the open. When it is produced by a community, the rule is that the development information is available. 130 | 131 | Therefore, the evaluation by third parties of the development processes is possible in the case of FOSS using open development models. 132 | 133 | TBD: repositories where the information about development is available. 134 | 135 | ## Community 136 | 137 | Development and user communities are usually key factors for FOSS products. Healthy development communities ensure the future survivability of the product even better than strong companies. Large, involved user communites ensure the needed pressure to keep the product in the leading edge. 138 | 139 | Therefore, the evaluatilon of communities is of great importance in the case of FOSS. 140 | 141 | ## Open development 142 | 143 | Some FOSS projects are developed "behind the curtains", not different from traditional projects. For those, little or no infornmation about the development process is available. But fortunately, these projects are the exception in the FOSS world. The usual case is that an open development model is used. 144 | 145 | A simple definition of open development is: 146 | 147 | > Open development is an emerging term used to describe the community-led development model found within many successful free and open source software projects. 148 | 149 | > [Avoiding abandon-ware: getting to grips with the open development method](bib:anderson-avoiding), by Paul Anderson 150 | 151 | That kind of development, because of its very nature, usually provides publicly a lot of information about the internals of their development processes. That allows for an evaluation of those processes, something that is impossible in the traditional, closed development cases. 152 | 153 | ## Competing market 154 | 155 | The existence of a competing market, with many providers of in-depth support, independent of each other, is possible in the case of FOSS. In the case of non-FOSS, due to the strict control granted by maintaining all copyright rights, only companies in agreement with the producer can provide this kind of services, and therefore no real competing market exists. 156 | 157 | In the case of FOSS, that market can exist. But it does not always exists. In fact, for many FOSS products no specific provider of in-depth support can be found. This is the case for most volunteer-driven and in-house projects, when an organization develops the software for its interenal needs. In both cases, until there is enough commercial use of the software, there is no demand for commercial support. Unfortunately, this means that new commercial actors interested in using the software will have more difficulty in doing so, for the very reason they cannot find support. 158 | 159 | TBD: Generic support companies can be useful here. Detail the case of a single-provider. 160 | 161 | TBD: importance of evaluating if such a competing market existis or not, and how it is. 162 | 163 | ## The importance of transparency 164 | 165 | Free, open source software communities are a matter of trust. All participants want to feel that the rules of the community are fair, and that everyone is considered on the value of their contributions, with no bias due other factors. For that, it is very important that the information about what happens in the community is transparent, and available to anyone. This is one of the main reasons for adopting open development models in FOSS communities. 166 | 167 | But having the data available is not enough. Specially in large projects, the community needs some means to understand what is happening. The quantity of data may be really large, and it is not easy to extract from it useful information. Therefore, transparency is not only providing the data, but providing the data in a way that it is useful for the community, in a way that helps it to understand what's happening at several levels of detail. 168 | 169 | ### The many facets of transparency 170 | 171 | 172 | ## Criteria for evaluation 173 | 174 | * Intangible factors 175 | * Risk 176 | * Functionality 177 | * ... 178 | 179 | 180 | ## Evaluation of functionality 181 | 182 | This is one of the more commons evaluations when selecting tools, either to use or to integrate with others. Usually, this is done in the context of a product acquisition procedure, and considers mainly compliance with requirements, quality, and adaption to certain needs. The evaluation can be used to balance against cost, or to select among products that could fit the requirements. 183 | 184 | Most of this evaluation is not different for FOSS and non-FOSS programas. Only the easy of access to the elements to evaluate make a difference. In the case of FOSS, source and binary code for the program are easily available for evaluation. Source code may be convenient to understand how a certain feature works, or to better evaluate performance. In some specific cases, such as evaluating security features, the availability of source code allows for deep inspection. But usually we can just use the general functional evaluation models. Therefore, instead of entering into details we will just sketch how functional evaluation can be done, illustrating with an example. This example is [Comparing LibreOffice with Apache OpenOffice](bib:jonkers-nouws-comparing-lo-aoo), a comparative functional evaluation of both products. 185 | 186 | The evaluation starts by defining a model of the product to evaluate, and a grouping of its more relevant characteristics. The next picture shows a functional model of the LibreOffice and Apahe OpenOffice case. 187 | 188 | ![Comparing LibreOffice and Apache OpenOffice: functional model](functional-evaluation-loaoo-model.png) 189 | *Commparing LibreOffice and Apache OpenOffice: functional model. All the evaluation in the report is based on this model.* 190 | 191 | The functional model defines the main functional components of the software to evaluate. Now, we can define functional features of relevance, and evaluate each of them. The evaluation can be quantitative or qualitative. In the former case, boolean (the feature is available or not) and fractional (the feature is available to a certain fraction of some "ideal" feature) evaluations can be performed. For example, a certain feature can be present in a product, or "80% present" with respect to some ideal feature. In the latter case, an expert provides a detailed description of each feature. In many cases, both evaluations can be present, since both can be relevant. 192 | 193 | In the case of the comparison of LibreOffice and Apache OpenOffice, the next picture shows the main relevant features for evaluation, grouped according to the modules defined in the functional model. 194 | 195 | ![Comparing LibreOffice and Apache OpenOffice: relevant features](functional-evaluation-loaoo-features.png) 196 | *Comparing LibreOffice and Apache OpenOffice: identification of relevant features* 197 | 198 | Which are later refined in specific features, which are evaluated to be present or not ("+" or "-" in the next figure). In this case, the functionality is related to the corresponding changes in the source code during a certain period. But this is not necessarily the case. A very similar evaluation can be performed just by defining functional aspects and then verifying if they are present or not in a certain product. 199 | 200 | ![Comparing LibreOffice and Apache OpenOffice: evaluation of features](functional-evaluation-loaoo-features-2.png) 201 | *Comparing LibreOffice and Apache OpenOffice: evaluation of features* 202 | 203 | This information can be later used to produce a report on the functionality found, in a comparison between different products, etc. 204 | 205 | ## Evaluation of suitability 206 | 207 | Example: OpenBRR 208 | 209 | ## Evaluation of quality 210 | 211 | Example: QSOS, Qualoss 212 | 213 | ## Evaluation of maturity 214 | 215 | Example: Polarsys Maturity Model 216 | 217 | ## Evaluation of community and development processes 218 | 219 | Example: The Bitergia evaluation 220 | 221 | -------------------------------------------------------------------------------- /location-debian-devel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/location-debian-devel.png -------------------------------------------------------------------------------- /mls-activity-senders-openstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/mls-activity-senders-openstack.png -------------------------------------------------------------------------------- /openbrr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/openbrr.jpg -------------------------------------------------------------------------------- /openhub-liferay-commits.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/openhub-liferay-commits.png -------------------------------------------------------------------------------- /openhub-liferay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/openhub-liferay.png -------------------------------------------------------------------------------- /openstack-scm-aging-2013-07.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/openstack-scm-aging-2013-07.png -------------------------------------------------------------------------------- /openstack-scm-aging-2014-07.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/openstack-scm-aging-2014-07.png -------------------------------------------------------------------------------- /osmm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/osmm.png -------------------------------------------------------------------------------- /process-backlog-crs-wikimedia.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/process-backlog-crs-wikimedia.png -------------------------------------------------------------------------------- /processes-crs-nova-time-to-merge.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/processes-crs-nova-time-to-merge.png -------------------------------------------------------------------------------- /processes-tickets-closed-open.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/processes-tickets-closed-open.png -------------------------------------------------------------------------------- /processes.md: -------------------------------------------------------------------------------- 1 | # Evaluating development processes 2 | 3 | Processes are fundamental in software development. We can model as processes most actions in development projects, from implementing a new feature, to fixing a bug, or even to make a decision about how to implement a feature. Many of these processes can be tracked using information available in software development repositories. 4 | 5 | From this point of view, FOSS projects are not different from any other software development project. But, as we discussed in the case of activity, when the project follows an open development model, most of the information needed to track processes is public. Therefore, any third party can use it to evaluate how those processes are being completed. 6 | 7 | ## Performance 8 | 9 | There are several metrics for evaluating performance in processes. Some of the most useful are: 10 | 11 | * Efficiency. Defined as the ratio between finished processes and new (started) processes for a certain period. For example, efficiency in dealing with tickets can be defined as tickets closed / tickets opened per month. Efficiency lower than one means that the project is not coping with new processes: more processes are starting that the project is finishing. Every period that happens, the backlog of open processes will increase. The evolution over time of efficiency allows to understand the long-term trends, and whether a certain efficiency is something temporary, or a permanent trend. 12 | 13 | ![Efficiency in dealing with tickets, OpenStack project](bmiOpenStackSoftware.jpg) 14 | *Example of efficiency: Ratio of closed to opened tickets per quarter for the OpenStack project. From the [OpenStack Community Activity Report, January-March 2015](), by Bitergia.* 15 | 16 | * Backlog of open processes. Defined as the number of processes currently open at a certain moment. For example, the backlog of code review processes still in process on a certain date. The backlog of open processes is the workload the project has to deal with, assuming no new workload appears. If the backlog increases, efficiency is lower than one, and the project is not coping with new processes. Of course, other backlogs are possible, for processes in other states different from "open". 17 | 18 | ![Backlog of pending code reviews, Wikimedia projects](process-backlog-crs-wikimedia.png) 19 | *Example of backlog: Pending code review processes in Wikimedia projects, evolution per month, circa July 2015.* 20 | 21 | * Time to attend. Defined as the time since the moment a process is open, to the time it is first attended by the project. For example, the time to attend a certain bug report. Statistics about time to attend say about how responsive the project is, in regard of providing some early feedback to the initiator of the process. For some cases, this early action on the process may be automatic, performed by a bot. Even when this is still interesting, since in the end the opener gets some feedback, usually it is important when a human is dealing with the process for the first time. 22 | * Time to finish. Defined as the time since the moment a process is open, to the time it is finished. 23 | 24 | Both for time to attend and time to complete, the mean for a collection of processes is not significant, because the distribution of times is usually very skewed. Medians or quantiles can be more useful to characterize the time to attend for a collection of processes. You can remember what the median and quantiles mean: 25 | 26 | > "A collection of processes has a median t of time-to-something if the longest of the 50% of processes with shortest time-to-something is t. In other words, 50% of the processes were shorter than t, and 50% were longest than t." 27 | 28 | > "A collection of processes has a 0.95 quantil of time-to-something equal to t if the longest of the 95% of processes with shortest time-to-someting is t. In other words, 95% of the processes were shortest than t, and 5% were longest than t. 29 | 30 | You can quickly notice that the median is the 0.5 quantil. 31 | 32 | As an example, if the median of time-to-close tickets for a certain month was 23.34 hours, that means that 50% of the tickets were closed in less than 23.34 hours, or than 50% of the tickets took more than 23.34 hours to close. 33 | 34 | As another example, if the 0.95 quantil of time-to-review a change is of 4.34 days, that means that 95% of the changes were reviewed in less than 4.34 days, but 5% of the changes were reviewed in more than 4.34 days. 35 | 36 | You should notice as well that you can only measure time-to-X if X already happened. For example, time-to-close can only be measured for a ticket if it was already closed. Otherwise, you can only know that time-to-close will be longer than the time it has been open up to now, but nothing else. This is important, because it can cause counter-intuitive situations. 37 | 38 | Assume for example a project with 100 still open tickets and 50 already closed tickets. All 50 closed tickets had a time-to-close of 2 days. All the 100 open tickets were open 60 days ago. If we measure the median of time-to-close now, it will be of 2 days, since it only applies to closed tickets. Now, the 100 still open tickets are closed during today. At the end of the day, we have 50 tickets with 2 days of time-to-close, and 100 with 60 days of time-to-close. That is, our median of time-to-close raised to 60 days, even when we closed a lot of old tickets. In other words, closing a lot of tickets raised time-to-close, which could be interpreted as a decrease in performance, when it is exactly the other way around. 39 | 40 | To avoid these effects, there are some other metrics, such as: 41 | 42 | * time-active. It is defined as time since the process started, only for processes which still didn't finish. In some sense, it is a complement to time-to-finish: while time-to-finish considers only finished processes, time-active provides a similar information, but only for processes still open. 43 | 44 | In the former example, time-active before closing the old 100 tickets would be 100 days foro all tickets to consider (those still open). After closing those 100 tickets, there are no tickets to calculate time-active, since all of them are closed. 45 | 46 | * time-open. It is defined as time-to-close for processes already closed, and time-active (time since opened) for processes still open. 47 | 48 | In the former example, time-open before closing the old 100 tickets would be of 2 days for the closed tickets, and of 60 days for still open tickets. That would be a median of 60 days. After closing the tickets, it would still be of 2 and 60 days, now keeping the median in 60 days, which at least remains constant. 49 | 50 | * aggregated-time-open. It is defined as the sum of time-open for all tickets. 51 | 52 | In the former example, before closing the tickets aggregated-time-open is of 53 | 54 | ```text 55 | 50 x 2 + 100 x 60 = 6100 days 56 | ``` 57 | 58 | * aggregated-time-open-diff. We define this as the difference with the previous time-open for all tickets. This allows us to have a comparison about how aggregated-time-open varies over time, if it is measured periodically. 59 | 60 | exactly as it will be after closing the 100 old tickets. This allows for a monotonic metric, which produces more intuitive results. 61 | 62 | ### Example: regular and burst processes 63 | 64 | Another example can illustrate a different scenario. Assume now that a project is opening 3 processes every day, and is closing them after two days (at the end of the second day). Metrics will be evaluated at the end of each day. In this case, metrics will evolve as follows: 65 | 66 | |Day|New|Finished|Open|Closed|TTF (median)|TA (median)|TO (median)|ATO | ATOD | 67 | |---|---|--------|----|------|------------|-----------|-----------|----|------| 68 | | 1 | 3 | 0 | 3 | 0 | N/A | 1 | 1 | 3 | N/A | 69 | | 2 | 3 | 3 | 3 | 3 | 2 | 1 | 1.5 | 9 | 6 | 70 | | 3 | 3 | 3 | 3 | 6 | 2 | 1 | 2 | 15 | 6 | 71 | | 4 | 3 | 3 | 3 | 9 | 2 | 1 | 2 | 21 | 6 | 72 | *Table describing an scenario of a project opening and closing processes. Each row represents a day. New: number of new processes open during the day. Finished: number of processes finished during the day. Open: number of processes still open at the end of the day (this is the backlog of still open processes). Closed: number of processes already closed at the end of the day. TTF: time-to-finish (or time-to-close) for all closed processes at the end of the day. TA: time-active, for open processes. TO: time-open, for open and closed processes, at the end of the day. ATO: aggregrated-time-open at the end of the day. ATOD: aggregated-time-open-diff at the end of the day. For simplicity, we assume that new processes start at the beginning of the day, and finished processes finish at the end of the day. All times are in days.* 73 | 74 | The median for time-to-finish quickly moves from N/A to 2 once the project starts to finish processes, and remain there since the time it takes to close tickets is constant. Time-active remains stable at 1, since at the end of the day, the processes from yesterday were closed, and only those opened when the day started are still active. Time-open reflects a bit more closely what is happening in days 1, 2, and 3, moving from 1 to 2 as more new processes start. Aggregated-time-open-diff shows the regularity of the system as well. 75 | 76 | Now, let's see how the metrics reflect a peak in new processes. Let's assume that on day 5, in addition to the 3 new processes that are finished in two days, 10 new processes start, and they are not finished during the following days. 77 | 78 | |Day|New|Finished|Open|Closed|TTF (median)|TA (median)|TO (median)|ATO | ATOD | 79 | |---|---|--------|----|------|------------|-----------|-----------|----|------| 80 | | 5 | 13| 3 | 13 | 12 | 2 | 1 | 1 | 37 | 16 | 81 | | 6 | 3 | 3 | 13 | 15 | 2 | 2 | 2 | 53 | 16 | 82 | | 7 | 3 | 3 | 13 | 18 | 2 | 3 | 2 | 69 | 16 | 83 | 84 | On day 5, we have 13 new processes: the 3 "regular" ones, and that peak of 10 more. At the end of the day, we have closed the 3 processes that started on day 4. This means that 13 processes (all that started during the day) remain open at the end of the day. We have a total of 12 closed processes (we had 9 on day 4, plus three more we finished today). The median for time-to-finish remains at 2, since all closed processes took 2 days to finish. All processes still open were opened at the beginning of the day, therefore time-active is 1 for all of them, and the median too. The median for time-open, on the contrary, and maybe surprisingly, went down to 1. The accounting is as follows: 12 closed processes took 2 days to finish, while 13 open processes have been open for one day. Therefore, more than 50% of the process have a time-open of 1 day. Aggregated-time-open rose to 37: it was 21, plus 3 more days for the processes finished during the day, plus 13 more days for the new processes that started today. Aggregated-time-open-diff rose to 16 days (37 - 21). 85 | 86 | On day 6, we have only three new "regular" processes, and we finish the three "regulars" that started on day 5. That means that the number of processes still open at the end of the day remains at 13, since none of the 10 "extra" processes that started on day 5 finished. The number of closed processes is rises to 15, with the 3 from day 5 that were finished. Time-to-finish remains at 2, since still all finished processes took 2 days to finish. Time-open now rises to 2, with the following accounting: 15 closed processes took 2 days to finish; 10 processes that started on day 5 have been open for 2 days; 3 processes open today were open for 1 day. In short: for 25 processes time-open is 2, for 3 it is 1. Therefore, the median is 2. With respect to time-active, for the three projects opened at the beginning of the day, it is 1, and 2 for the 10 projects opened on day 5. Therefore the median of time-active is 2. From all this accounting it is clear that aggregated-time-open is 53, and aggregated-time-open-diff is 16. 87 | 88 | On day 7, new, finished and open remains as on day 6, since only 3 "regular" new processes start. Closed processes are increased with the 3 that are closed today. Time-to-finish remains at 2. We have 13 open processes, being 10 of them 3 days old, and three 1 day old: median is therefore 3. Time-open is calculated as follows: 18 processes took two days to close, 10 processes were open for 3 days, 3 processes were open for 1 day. Therefore, the median of time-open remains at 2. From these numbers, aggregated-time-open is 36 + 30 + 3, that is 69. Aggregated-time-open-diff is 16 once again. 89 | 90 | From this scenario, we can learn several lessons. Time-to-finish does not reflect new processes that are still not finished. They can last for long periods, but will not be reflected in time-to-finish until they are finished. That means that time-to-finish can grow quickly when old processes are finalized, which is natural as we defined the metric, but is maybe not what some people expect when consider a longer time-to-finish as a worse metric, when comparing. Time-active, meanwhile, reflects how old processes still active age, but completely ignore (by design) how long it took to close processes. 91 | 92 | Another lesson is that time-open can be masked by a large population of closed processes. In the example, assuming the pattern of new processes includes only "regular" processes, the median for time-open will remain at 2, even when a large amount of open tickets are unattended. 93 | 94 | Aggregated-time-open and specially aggregated-time-open-diff reflect much better what is happening. Aggregated-time-open-diff, in particular shows how we have a continuous "lag" in dealing with processes, those 16 days of "increase" every day. That metric rose immediately when new processes entered, and will only go down when they are finished. It reflects to some extent the "amount of work still open". 95 | 96 | ### Metrics for periods, metrics for snapshots 97 | 98 | To better understand how the above metrics evolve over time, it is important to consider how exactly they are defined when we want periodic samples of them. The key is characterizing the collection of processes used to calculate the metric. In short, some metrics are defined for collections corresponding to periods, and some others are defined for collections fulfilling some property in given snapshots (cuts in time). Depending on whether they are defined on periods or on snapshots, they behave differently. 99 | 100 | Efficiency and time-to metrics are defined on collections of processes defined over periods. Backlog is defined on collections of processes defined on a point in time, a "snapshot" of the processes. Since snapshots are easier to understand, let's start explaining them. 101 | 102 | When we want to analyze the evolution of the backlog over time, we define the sampling rate (say, once per week), and the starting point for the time series (say, January 1st at 00:01). What we do after that is to measure the backlog at the given points in time, by selecting the collection of open processes (if this is the backlog of open processes), and counting it: 103 | 104 | |Snapshot |Collection |Number| 105 | |----------------|--------------|------| 106 | |2015-01-01 00:01|Processes open| 34 | 107 | |2015-01-08 00:01|Processes open| 23 | 108 | |2015-01-15 00:01|Processes open| 37 | 109 | |2015-01-22 00:01|Processes open| 46 | 110 | |2015-01-29 00:01|Processes open| 51 | 111 | 112 | Collections based on periods are a bit trickier. If we consider for example efficiency, it is defined as the ratio of opened to closed processes. For comparing how the system is evolving over time, we need to define comparable collections of processes as time passes. But it is not useful to define those collections as "opened processes" and "closed processes" at some snapshots. The reason is clear: at a certain point in time, the collection will contain usually zero, or maybe one (if it was exactly opened or closed at that point in time) process. Which doesn't make sense for studying the evolution. 113 | 114 | To avoid this problem, we define collections on periods. For example, all tickets opened during the first week of the year, and all tickets closed during the first week of the year. 115 | 116 | But periods are not "points on time", and therefore we have to be careful on how we define them. If they are too short, they are going to capture too many occasional effects, and periods are going to be difficult to compare. But if they are too large, they are going to miss seasonal effects, masked by the "mean behavior". 117 | 118 | If they are not homogeneous enough, they can be misleading. Consider for example two consecutive periods of 10 days each, but one capturing one weekend (Tuesday 7 to Friday 16), and the second one capturing two (Saturday 17 to Sunday 26). If processes are affected by lower activity during weekends, as is usually the case, the second period will appear less active when in fact maybe it is not, considering the seasonal effect of weekends. There are statistical tools that help with these effects, but a good selection of periods can minimize this effect. Days, weeks, months, quarters of years are usually good periods, when the granularity is good enough for the kind of process being analyzed. 119 | 120 | Another effect to have into account is which processes we consider as being included in the period collection. For example, if we are measuring opened versus closed tickets for a given month, say January 2015, we can consider processes closed during that month, or processes closed at any point in time, but opened during that month. The first definition provides information on how the project is performing during a month, in terms of finished work (processes closed) versus new work (processes opened). The second definition provides information about how much work that started that month was already completed. Both are interesting, but both are very different. 121 | 122 | Because of these reasons, it is important not only a careful selection of the period, and inclusion criteria for deciding the collections corresponding to the period. It is important as well to define carefully which processes are interesting to select, according to what is interesting to measure. 123 | 124 | ### Some remarks about performance in finishing processes 125 | 126 | In the end, when you are interested in performance in finishing processes, you should consider both the backlog and some statistics (usually the median or some quantile) of time-to-finish or time-open. The backlog will tell you about how much work is pending. The time-to-close about how long did it take to finish the processes. 127 | 128 | All these metrics have to be considered in the context of activity. This applies specially to efficiency and backlog, but affect other metrics too. For example, in the context of a project where activity is growing quickly, it is relatively normal that efficiency is less than one, but still the project is healthy in the long term. When activity is growing quickly, usually the project is receiving more resources, and its community is growing accordingly. 129 | 130 | But allocating new effort to deal with processes may take some time, while the growth in activity is directly linked to the opening of new processes. Therefore, it is usual that there is a certain lag between needs to close processes, and new people dealing with them. As the project stabilizes, it will start to create new processes more slowly, efficiency will increase, and backlog will start to decrease. 131 | 132 | ### New features, bug fixing and code review 133 | 134 | Among the tickets in the ITS, most issues deal usually with either feature requests or bug fixes. In fact, many projects require that the process towards any change in the code, either to fix a bug or to implement some new functionality, start with filing a ticket. 135 | 136 | In the case of new features, the person starting the process may be a developer, and in that case we usually talk about a proposed feature. But it can be a user as well, and then it becomes a feature request. Both developers and users file report bugs by opening a bug report ticket. 137 | 138 | From a traditional software engineering point of view, telling features from fixes may be considered important. This is because they signal two different activities: "real" development, when adding new functionality, or maintenance, when fixing bugs. In traditional environments this could be done at different stages in time, and even by different teams with primary responsibilities either in producing the next release (development) or in fixing problems with past releases (maintenance). In many modern projects, specially when they are using continuous release practices, this difference is less important. 139 | 140 | From a practical point of view, be it interesting or not, there are practical problems for doing specific analysis for features or bugs. The main one is how to tell feature requests and proposed features from bug fixes. In some cases, the ITS provides a flag to make a difference, but even in those cases, it is only an indication that the ticket could be referring to a feature or a bug. On one side, bug fixing can evolve to feature implementation, and the other way around. On the other, it is not always easy to decide if a certain activity is improving the system by adding some missing functionality, or fixing a bug. 141 | 142 | Consider for example the case of o form not working properly when using a touchpad. Solving it can be understood as fixing a bug (the form should work always, and it was not working in certain circumstances) or as implementing a new feature (support for touchpad). When you can match against a detailed list of requirements, this could be solved by deciding if the form was intended to work with touchpads or not, and then classify the action as a fix or as new functionality (by adding a new requirement). But most FOSS projects are not that formal, and even when they are, this is in many cases a matter of how requirements are interpreted. 143 | 144 | To complicate matters further, in some projects there are more activities being carried on in the ITS. Those can include discussions on requirements, on the policies of the project, or requests related to the use of the development infrastructure. 145 | 146 | And still a step beyond in making things difficult for the analyst, some projects use the ticketing system for code review. This has been a natural evolution, when specialized code review systems didn't exist. In fact, writing comments with opinions on a patch linked to a ticket, or to a commit that closed a ticket, are two examples of coded review which can be found in many projects, even when code review was not formally adopted by them. When some projects decided to adopt formal code review procedures, they started by using what they had handy: the ITS. That's how projects such as WebKit defined workflows in the ITS (Bugzilla in their case) to deal with code review. Other projects used workflows defined on Jira. 147 | 148 | With time, specialized systems such as Gerrit emerged. Even when they are focused on code review, they still use a model quite similar to ticketing systems, with each code review cycle being modeled as a ticket. Other systems, such as GitHub pull requests, are even closer to tickets, to the point that the interface they offer is almost the same. 149 | 150 | From the analytics point of view, the good news are that given these similarities, the analysis of bug reports, feature implementation and code review are quite similar. That is the main reason why in most of this chapter we talk about processes instead of tickets or code review. However, there are some important differences too, for example in the workflows, or in some metrics which can be unique to some of these cases but not to the others, but still are very important for the cases that apply. 151 | 152 | 153 | ## Workflow patterns 154 | 155 | Projects use different workflows to deal with processes. In many cases, even they can have specific policies, such as defining allowed state transitions to deal with tickets, or review patterns to deal with proposals of change to the source code. When the different states and transitions are recorded in some development repository, the complete process can be tracked. This allows for the analysis of the real workflows, and their compliance with good practices or project policies. In addition, time between transitions can be measured, to detect bottlenecks and compute time for different workflows. 156 | 157 | ### Workflows for tickets 158 | 159 | The workflow of tickets depend a lot on the specific ITS used. In addition, most of them have a basic workflow defined, but either the administrators of the system, or in some cases its users, can customize it. For example, the next figure shows the default workflow for Bugzilla, whcih different projects adapt in different ways. 160 | 161 | ![Default workflow in Bugzilla 4.0](bugzilla-lifecycle.png) 162 | *Default workflow for a ticket in Bugzilla 4.0.1, as shown in [Life cycle of a bug](https://www.bugzilla.org/docs/4.0/en/html/lifecycle.html), in the Bugzilla manual.* 163 | 164 | However, there are some aspects in which most, if not all ITS behave teh same way. The life of a ticket starts when it is opened. In some cases there is a specific "new" or "open" state, in which it stays until the first actions are taken, usually assigning it to someone. This process is called "triaging the ticket". It is very important, because until the ticket is assigned, it is very unlikely that someone starts to work with it. Therefore, "time-to-triage" is a meaningfull metric. The process of triaging may involve confirming a bug, or asking more details about it, or discussing whether a feature request makes sense. 165 | 166 | Once the ticket is "ready to be worked on", some actions are taken until it is resolved. That means that the person in charge considers that the work is done. But in most cases, that has to verified, so that a third party, maybe the person who open the ticket, considers that indeed the issue is solved. Therefore, time-to-resolved and time-to-verified are interesting metrics, telling about how developers are working, and about how good their work is (in terms of satisfying third parties verifying the solutions). 167 | 168 | In any state, the ticket can be closed. For example, an old untriaged ticket can be closed because no further action can be done on it. Or an unsolved ticket can be closed because new versions render the ticket void. Usually, tickets are closed after being verified. But even in this case, they can be reopen if new evidence arises suggesting that the work is not really done. 169 | 170 | ## Workflows in code review 171 | 172 | In code review, the workflow can be divided between two states: waiting for review and waiting for new change proposals. For entering into details, let's consider two of the most usual CRS these days: Gerrit and GitHub Pull Requests. 173 | 174 | For Gerrit, the workflow starts with a developer submitting a proposal for a change (a change). This proposal is composed by a patchset (in fact, the contents of a git commit), and a comment (which may include a reference to the ticket which originated the change proposal). Once the patchset is submitted, anyone can review it, by submitting comments and tags. Tags can range from -2 to 2. Usually +1 means "I agree with this change", and -1 means "I don't agree with this change". -1 should be accompanied by a comment stating what should be changed in the patchset to make that reviewer happy. -2 and +2, when used, usually refer to special reviewers, that in some projects have the decision power. In a given patchset is approved (+1 or +2) it gets merged in the main branch, and is considered to enter "production-ready code". If it gets rejected (-1, -2), the developer is expected to send a new patchset, addressing the concerns of reviewers. In addition, automated testing can also be the cause of rejection. If change submitters come to a point when they don't want to produce new patchsets to address reviewers concerns, they can abandon the change. 175 | 176 | In this workflow we can measure the total time to accept, but also the time waiting for review: since a patchset was uploaded to when a reviewing decision on it was made. And the time waiting for the developer: since a new patchset was requested to when it was uploaded. Some other timings, related to how long does it take the automated testing process can also be considered. And finally, the number of cycles, or patchsets, that were needed for a change proposal to be accepted can be an interesting metric too. 177 | 178 | For GitHub Pull Requests, the workflow is a bit more simple. The process starts by submitting a pull request, which is a collection of commits. Anyone can comment on the pull request, and comments can include requests for changes to the commits. The developer can change the commits to have those comments into account. When someone with commit rights considers that the pull request is ready, can merge it in the code base. Automated testing can be used too, which usually annotates the pull request ticket with comments or flags. 179 | 180 | This process is more flexible than the Gerrit one, but also more difficult to track in an uniform way. For example, is very difficult to tell when a change in the commits was made, or even if it was made at all, since no tags are mandatory. This means that measuring time waiting for reviewer or for developers are difficult to measure, or even to define. 181 | 182 | You can see more details about how a specific project uses Gerrit for code review in the [OpenStack Developers Manual](http://docs.openstack.org/infra/manual/developers.html#code-review). For details on GitHub Pull Requests, you can read [Using pull requests](https://help.github.com/articles/using-pull-requests/). 183 | 184 | 185 | ## Participation 186 | 187 | We can also analyze who participated in specific processes in a project. From that point on, we can assess on several dimensions, discussed in sections below, such as diversity in participation or neutrality. 188 | 189 | The relevant participants are different depending on the kind of processes. For example, for ticket closing: 190 | 191 | * Openers or submitters. Those that file the tickets, either reporting a bug, requesting or proposing a feature, or raising any other issue. 192 | * Commenters. Those commenting on the ticket. Comments may be requests for more information, provisions of that requested information, reporting of the progress, proposals to receive feedback, etc. 193 | * State changers. Those changing the state of the ticket. In some ITS, they can be just flagging the ticket in some way. 194 | * Closers. Those closing the tickets. 195 | 196 | But in fact, for any transition of state, an actor can be defined, which leads to very specific kind of actors, depending on the project. This means as well that the identification of states and transitions in the workflow with tickets is fundamental for the identification of relevant participants. 197 | 198 | As a side note, the identification of relevant participants in specific actions in the ITS may be tricky. For example, in the case of OpenStack, the person considered to be closing a ticket is not really the one changing the state to closed, but the owner of the ticket at that time (the owner is the person assigned to it). 199 | 200 | In the case of code review, the most relevant participants are: 201 | 202 | * Submitters. The persons submiting proposals for change. 203 | * Reviewers. The persons reviewing the changes. 204 | * Commenters. The persons providing comments. This can be reviewers, commenting on how to improve the change proposal, oor submitters, commenting on how the will address reviewers' suggestions, or third parties. 205 | * Core reviewers. In some projects, there is a special kind of reviewer, elected by the project, whith have the power to accept or reject changes. 206 | 207 | 208 | 209 | ### Diversity in participation 210 | 211 | There are several aspects about participation in processes where diversity has a role: 212 | 213 | * Participation from diverse time zones in specific process, which may help to speed the process up, or to slow it down, depending on how is that participation. 214 | * Participation by organizations, which is an indicator over the level of control that a single actor, or a small group of them, maybe have over a project. Or the level of dependency that the project has on those actors. 215 | 216 | To analyze diversity, it is needed to carefully determine the relevant actors, and then characterize them from any of these points of view (geographical area, affiliation to organziations, etc.). 217 | 218 | ### Neutrality 219 | 220 | Neutrality means how neutral the community is to the different individuals or groups that work together in it. Neutrality is important to the community, because it ensures that all actors are considered equal with respect to the characteristics that are not related to their capacities or skills. For example, it ensures that no company intentionally delays fixing bugs that were reported by some other company, or that code proposed by people from a certain region is not taking longer to review, given other aspects are equal. 221 | 222 | Once the diversity analysis is performed, and the relevant diversity characteristics to consider are defined, the neutrality analysis produces metrics for each of the individuals or the groups identified, to allow for comparisons. But those metrics have to be considered with some care. For example, a neutrality analysis can show how time-to-review for developers of company A is taking twice than for company B. This could easily to the conclusion that company A is being discriminated by reviewers with respect to company B. But it could happen as well that developers from company A are much less carfull, or less experienced, or less trained, than those of company B. And that could lead to employees of A submitting much worse change proposals than those of B. Which would explain perfectly the difference in time-to-review, since reviewrs could be much more reluctant to decide on their code because of those reasons. 223 | 224 | Therefore, the main goal of the neutrality analysis is to provide metrics that are at the same time fair and relevant. That is, that the differences in them are really related to discrimination and lack of neutrality, and not to different skills or expertise. 225 | 226 | ## Metrics for tickets 227 | 228 | Based on all the information above, in this section we study which metrics are more relevant for tickets. Let's start with some timing metrics for feature requests: 229 | 230 | * Time to attend: Up to the moment there is some comment by a developer, usually commenting on the feasibility of the request, and maybe assigning to a developer for implementation. 231 | * Time to first patch: Up to the moment a patch implementing the feature is attacjed to the ticket. 232 | * Time to final patch: Up to the moment a patch lands in the code base intended for the next stable release. In some cases, this is equal to time to first patch, because there is no further process once the patch is produced. But in others, code review or automatic testing is in the middle of landing into the code base. 233 | * Time to release: Up to the moment the patch is included in a public release. If the system is following a continuous release policy, this can be exactly equal to time to final patch. But when point releases are produced at discrete moments, this time can be considerably longer. 234 | * Time to deployment: Up to the moment the patch is deployed into production systems. In many cases, the FOSS project doesn't have a direct reference to this moment, since it happens downstream, in the institutions using the software. 235 | 236 | Ratios between new feature requests and final patches or released patches are interesting to evaluate the performance of the project in terms of how much it is coping with new requests. 237 | 238 | In any case, the metrics discussed for new features can be applied to bug fixes too. There are some other intersting metrics as well: 239 | 240 | * Reopenend bug reports. Those tickets that after being considered closed, because it seemed that the bug was fixed, have to be reopened because it was not fixed at all. 241 | * Ratio of reopened to closed. Give an idea of how effective is the bug fixing process. If the ratio is high, that means that many bugs are assummed to be fixed when they are not, and therefore have to be reopened. 242 | 243 | Because of the difficulty of differentiating between feature requests and bug reports, in many cases it is useful to obtain the metrics for all tickets together. 244 | 245 | ![Closed and open tickets per month, Apache Cloudstack circa July 2015](processes-tickets-closed-open.png) 246 | *Evolution of closed and open tickets per month for the Apache Cloudstack project circa July 2015* 247 | 248 | For example, the chart above shows the evolution of closed and opened tickets per month for a FOSS project. In this case, it can be observed how the number of opened tickets is larger than closed tickets almost for every month. This situation, which is very common in real projects, means that every month the project is not coping with all the new work they have. From time to time, the project can run "closing parties", or use bots just to close old bugs that are never going to be fixed, and maybe are no longer bugs. The blue spike in Summer 2013 could be one of such cases. 249 | 250 | ## Metrics for code review 251 | 252 | Most of the metrics used for issue tracking systems (either bug fixes or feature requests) can be applied to code review, if we consider the submission of the patch to review as the starting point of the metric. Therefore, we have: 253 | 254 | * Time to attention: Up to the moment a reviewer starts taking action on the code review system. 255 | * Time to review: Up to the moment reviewers take a decision on the review of the code, such are accepting it, or requesting some changes from the submitter. 256 | * Time to new submission: From the moment some changes were requested by reviewers, to the moment a new patch is sent. 257 | * Time to merge: From the moment a code review process starts, to the moment the corresponding change lands in the code base. 258 | 259 | ![Time to merge in Nova (OpenStack), 2013-2014](processes-crs-nova-time-to-merge.png) 260 | *evolution of time to merge in Nova (an OpenStack subproject), by quarter, during 2013 and 2014.* 261 | 262 | Based on these times, aggregated times for each review process can be computed: 263 | 264 | * Reviewer time: Aggregated periods while the submitter is waiting for review by reviewers. This metric captures the amount of delay in the process which is under the responsibility of reviewers. 265 | * Submitter time: Aggregated periods while reviewers are waiting for a new submission. This metric captures the amount of delay which is under the responsibility of submitters. 266 | 267 | Therefore, a long time to review may be due to a long reviewer or submitter time. The problems the project is facing in both cases, and the means to solve them, are very different. 268 | 269 | Not only time is interesting: 270 | 271 | * Number of review cycles: Number of reviews (from submission of a change to acceptance or request for a new change) that are needed to finish a review process. 272 | 273 | In addition, metrics about the effectiveness of the code review process are useful too. For example: 274 | 275 | * Ratio of abandoned to merged changes. Gives an idea about how many of the submitted proposals for change end nowhere, with respect to how many end in the code base. 276 | * Ratio of merged to submitted changes. This is a kind of a success ratio, showing how many review processes finish with a change in the code base with respect to how many were started. 277 | 278 | ## Metrics for mailing lists 279 | 280 | In mailing lists and other asynchronous communication systems process metrics can be useful as well. Communication channels can be used by users to solve problems or report them. This interaction can be monitored in several repositories. In our experience, this can be done mainly in ITS, but also in asynchronous communication systems. From those, it can be known: 281 | 282 | * Who asks for help, or in general comments on issues related to the project. 283 | * Who participates in solving those issues. 284 | * Who of those who participate are developers. 285 | 286 | And some timing information, such as: 287 | 288 | * Time to first answer. From the moment a question is made, to the moment the first answer arrives. 289 | * Time for whole discussion. Which is the timespan from the first message in a thread to the last one. 290 | 291 | -------------------------------------------------------------------------------- /qsos-maturity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/qsos-maturity.png -------------------------------------------------------------------------------- /qsos-radar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/qsos-radar.png -------------------------------------------------------------------------------- /qsos-tools.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/qsos-tools.png -------------------------------------------------------------------------------- /qsos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/qsos.png -------------------------------------------------------------------------------- /quantitative.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | -------------------------------------------------------------------------------- /scs-irc-example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/scs-irc-example.png -------------------------------------------------------------------------------- /sources_of_information.md: -------------------------------------------------------------------------------- 1 | # Sources of information 2 | 3 | When evaluating FOSS projects there are many potential sources of information, which in many cases, and specially when the project follows the open development paradigm, are public. Some of these sources are: 4 | 5 | * Source code management systems (SCM), such as git, Mercurial or Subversion. All changes to the source code, and in some cases, to documentation and other related artifacts, are stored in them. From these systems we can extract any past version of the source code, which allows for code analysis, code inspection, etc. SCM systems store metainformation for every change, which usually includes who authored and committed it, when, the files involved with the corresponding diffs, etc. 6 | * Issue tracking systems (ITS), also names ticketing systems or bug reporting systems, although they are used for much more than reporting bugs. Most projects use them for tracking how bugs are reported and fixed, and how new functionality is proposed, defined and built. But they can also use ITS for tracking the policy decision or the code review process, or even for tracking how the infrastructure is managed. Their repositories are usually modeled as tickets, which experiment changes in state until they are closed. Some examples are Bugzilla, Lanuchpad, GitHub Issues, Allura Tickets, Trac or RedMine. 7 | * Code review systems (CRS). Used to track the code review process. They are usually modeled very similar to ITS, but specialized for code review, and in many cases integrated with SCM. Some examples are Gerrit and GitHub Pull Requests. 8 | * Asynchronous communication systems (ACS). The most classical ones are mailing lists and forums. But more recently others are emerging, such as question/answer systems, of which StackOverflow and Askbot are good examples. The repositories for ACSs are usually modeled as messages, grouped in threads, either implicit or explicit. Each message includes content, but also metainformation such as author, date, etc. 9 | * Synchronous communication systems (SCS). Some examples of them are IRC or more recently, Slack. 10 | * Testing and continuous integration systems (CI). Some examples are Jenkins, Hudson and TravisCI. 11 | * Web sites and other repositories for content related to the project. They may include documentation, download areas with binaries ready to run, etc. 12 | 13 | All these systems usually offer means for persons to interact with them, which can be used to get a first hand impression of how the project is using them. Qualitative evaluations can benefit from this kind of browsing of information. This can be done, for example, by reading comments in commit records, messages in mailing list archives or IRC logs, or the history of tickets in ITSs. 14 | 15 | It is also possible to use tools to retrieve information from them, usually via APIs designed for that matter. This allows for the automated retrieval of information for performing quantitative evaluation, or for storing all the data in a database for further analysis. 16 | 17 | ## Source code management systems 18 | 19 | Information in SCMs is usually organized in "changes", which in most systems are named "commits". The information stored with each change is different for different SCM systems, but it always incude the change itself (some way of identifying which lines were affected by the change, and how), and some metainformation. The metainformation is about who and when produced the change, and some other information related to it. For example, in the case of git, that metainformation associated with each change includes: 20 | 21 | * A unique identifier for the commit (dubbed the "commit hash"). 22 | * The commit id of the previous commit(s) in the history of the repository. 23 | * An identifier for the author (the person writing the change). 24 | * An identifier for the committer (the person committing the change to the repository). 25 | * Dates for authoring and committing actions. 26 | * A comment produced by the author of the change. 27 | * Complete diff with the changes (differences introduced by the change with respect to the previous situation). 28 | 29 | ### An example of a git commit record 30 | 31 | To illustrate how to obtain this information, see below an example. It is the information in a git commit record, as produced using the command ```git show --pretty=fuller``` for a commit in the CVSAnalY repository (excluding the diff): 32 | 33 | ``` 34 | commit 364f67f13b0046c0a0a688b30a1341ff9946ac26 35 | Author: Santiago Dueñas 36 | AuthorDate: Fri Oct 11 12:55:44 2013 +0200 37 | Commit: Santiago Dueñas 38 | CommitDate: Fri Oct 11 12:55:44 2013 +0200 39 | 40 | [db] Add author's commit date 41 | 42 | Some SCMs, like Git, make a distinction between the dates when 43 | the committer and author pushed the changes. 44 | ... 45 | ``` 46 | 47 | The first line shows the commit hash, the next four are the authoring and committing identifiers and dates. In this case, author and committer are the same person, and authoring adn committing dates are the same. After those come several lines with the comment produced by the author, describing the change. After the comment comes the diff, with the list of changes (which were ommitted in the snippet, see below). 48 | 49 | The raw information in the commit record can be obtained with ```git show --pretty=raw```: 50 | 51 | ``` 52 | commit 364f67f13b0046c0a0a688b30a1341ff9946ac26 53 | tree c121da67fcba250490b6b326deae46f041b76626 54 | parent 99acc4d7762e3773751f17ae9f0b58169f5e4de0 55 | author Santiago Dueñas 1381488944 +0200 56 | committer Santiago Dueñas 1381488944 +0200 57 | 58 | [db] Add author's commit date 59 | 60 | Some SCMs, like Git, make a distinction between the dates when 61 | the committer and author pushed the changes. 62 | ... 63 | ``` 64 | 65 | This format is harder to read for humans, but includes more information (such as the previous commit, or "parent"), and is therefore more useful for mining data. 66 | 67 | The first lines of the diff that was omitted for the above commit are shown below: 68 | 69 | ```diff 70 | diff --git a/pycvsanaly2/DBContentHandler.py b/pycvsanaly2/DBContentHandler.py 71 | index 579e103..4b0066d 100644 72 | --- a/pycvsanaly2/DBContentHandler.py 73 | +++ b/pycvsanaly2/DBContentHandler.py 74 | @@ -149,7 +149,7 @@ class DBContentHandler (ContentHandler): 75 | self.actions = [] 76 | profiler_stop ("Inserting actions for repository %d", (self.repo_id,)) 77 | if self.commits: 78 | - commits = [(c.id, c.rev, c.committer, c.author, c.date, c.message, c.composed_rev, c.repository_id) for c in self.commits] 79 | + commits = [(c.id, c.rev, c.committer, c.author, c.date, c.author_date, c.message, c.composed_rev, c.repository_id) for c in self.commits] 80 | profiler_start ("Inserting commits for repository %d", (self.repo_id,)) 81 | cursor.executemany (statement (DBLog.__insert__, self.db.place_holder), commits) 82 | self.commits = [] 83 | diff --git a/pycvsanaly2/Database.py b/pycvsanaly2/Database.py 84 | index dacf406..9b02909 100644 85 | --- a/pycvsanaly2/Database.py 86 | +++ b/pycvsanaly2/Database.py 87 | ``` 88 | 89 | The first lines of the diff shows how to invoke the diff command to produce the output for the first file changed by the commit (a refers to the situation before the change, b to the situation after the change). Lines changed are those starting with - (removed) or + (added). In this case, a change in a line is modeled as removing the old line and adding a new one with the change. 90 | 91 | ### Notes on using information from SCM systems 92 | 93 | Identifiers for authors and committers are usually a name and an email address. But in some cases, such as Subversion, only a user name is available. 94 | 95 | In decentraliized SCMs, such as git, dates for authoring and committing include local timezones, usually those of the computer in which the corresponding action was performed. This allows for timezone analysis to infer regions where developers work. In addition, since times are local, studies on daily schedules can also be performed. But in the case of centralized SCMs, such as Subversion, those dates are for the server where commits took place, which means that no information about local time for developers is avilable. 96 | 97 | The information in the SCM allows for the reconstruction of the whole history of the repository. In the case of git, the information about the previous commit or commits (in the case of merges, there is nore than one previous commit) allows for the recovery of the full history of the repository. Using that information and some other hints, you can decide to which branch a change was committed. 98 | 99 | The complete diff whcih is available for each change allows for the complete reconstruction of the code modifications, which is the ultimate reason for storing it. But it can be used for infering the files involved, the size of the change, and other parameters. 100 | 101 | It is important to note that, from a historical point of view, the information provided by the SCM system is now always reliable. For example, in the case of git, developers can "rewrite" history, when they ammend or rebase. Therefore, a current retrieval of information from a git repository may show different data for some commits in the past than a similar retrieval done some time ago, or even a dfferent list of commits. For systems which only commit with an automated tool after code review, which checks and forbids history rewrites, this is not an issue. But most projects do not have specific rules or technical measueres to avoid history rewrittings, and therefore any results about past history need to be understood as "current past history". 102 | 103 | ## Issue tracking systems 104 | 105 | Information in issue tracking systems is usually organized in "tickets". This is why they are also called ticketing systems. In fact, the job of the ITS is to track the changes to each ticket. Therefore, most ITS maintain both a record for each ticket with its current state, and a history of all changes to their state, or a list of past states. 106 | 107 | The information usually found for a ticket is: 108 | 109 | * Identifier. Unique identifier for each ticket. 110 | * Summary. A one line text summarizing the purpose of the ticket. 111 | * Description. A longer description of the purpose of the ticket. This can be reporting a bug, requesting a feature, or starting a new task, for example. 112 | * Opening date. When the ticket was opened. 113 | * Ticker opener. An identifier for the person opening the ticket. Depending on the ITS, this can be a full name, an email address, or a user name. 114 | * Ticket asignee. The person asigned by the project to deal with the ticket. 115 | * Priority. A number or a text informing about the priority for the ticket. This is usually set by the ticket opener, but can be later adjusted by developers. 116 | * State. A tag with the current state of the ticket. Examples of states are "open" (ticket opened, but still not dealt with by the project), "assigned" (ticket already assigned to some developer), "fixed" (ticket is considered to be fixed), "closed" (the issue is considered to be done). 117 | 118 | Almost all fields in the ticket are subject to changes. When this happens, the change is recorded, including information about who made the change, when, and what the change was about. 119 | 120 | In addition, a ticket usually have an associated list of comments. These comments are posted by the different persons who contribute to close the ticket. Some of them may be from the ticket opener, such as when a clarification is posted. Some others may be from people interested in the ticket, such as other people experiencing the same problem. Some others may be by developers trying to solve the issue. Each comment is composed by some text (the comment itself), the posting date, the author of the comment, and maybe some other fields. 121 | 122 | But despite having a similar structure, tickets in different ITS may be presented to users in very different ways, as is shown in the next examples. 123 | 124 | ### Example of ticket in GitHub 125 | 126 | In GitHub, tickets are called "issues". They are presented in a single HTML page, showing the description, history and comments in it. Most of the current state is shown in the right column. 127 | 128 | ![GitHub issue](its-github-example.png) 129 | *Example of ticket: [GitHub issue from the Bicho project](https://github.com/MetricsGrimoire/Bicho/issues/122). The most relevant fields can be observed. The first text on the left column is the description of the ticket, the second one is a comment.* 130 | 131 | GitHub also provides the same information via the [GitHub Issues API](https://developer.github.com/v3/issues/), which is easier to use for automated retrieval of information. 132 | 133 | ### Example of ticket in Bugzilla 134 | 135 | In Bugzilla tickets are called "bugs". Bugzilla is one of the earliest free software ITS, and is currently in use by some very large communities, such as Eclipse. In the following example, the status information for a ticket is shown. Below that area, the list of comments to the ticket is found, starting by its description. 136 | 137 | ![Bugzilla issue](its-bugzilla-example.png) 138 | *Example of ticket: [Bugzilla issue from the Eclipse Mylin project](https://bugs.eclipse.org/bugs/show_bug.cgi?id=438817). Comments come below this status information.* 139 | 140 | Bugzilla provides similar information via an XML file, which is more suitable for automated retrieval. 141 | 142 | ### Notes on using information from an ITS 143 | 144 | First of all, it is important to notice that projects may use tickets for many different kinds of actions. The most known one is reporting a bug, and that's the reason why these systems are also called "bug reporting systems". But tickets may refer to feature requests, tasks being performed, or even policy discussions. It is up to the policies and uses of a project to decide what kind of communication they channel through the ITS, and that varies a lot from project to project. Therefore, counts on for example "open tickets" are different from counts on "open bugs". 145 | 146 | ITSs are also very different on which kind of information they store, and what does it mean. For example, all of them include some kind of encoding the "state" of the ticket. But while GitHub uses tags for that, Bugzilla uses a "status" field. In addition, each project may use this information in different ways, and in many cases they define their own tags, status fields, or whatever the ITS uses for this. That makes it very difficult to know at first sight even when a ticket is a bug, or when a bug is actually fixed. The workflow for tickets, and how and who can move them from state to state, is usually project-defined,although the ITS may constrain or recommend about it. For example, the next figure shows the recommended workflow for Bugzilla. 147 | 148 | ![Bugzilla recommened workflow](its-bugzilla-workflow.png) 149 | *[Recommended workflow in Bugzilla](https://www.bugzilla.org/docs/3.6/en/html/lifecycle.html). Original figure illustrates [Life Cycle of a Bug](https://www.bugzilla.org/docs/3.6/en/html/lifecycle.html), in [The Bugzilla Guide - 3.6.13 Release](https://www.bugzilla.org/docs/3.6/en/html/index.html) distributed under the GNU Free Documentation License.* 150 | 151 | This said, ITSs provide a rich information about how the project is dealing with some of its most important development processes. For example, they allow for the calculus of time-to-fix for bugs, or time-to-implementation for feature requests. They allow as well for the identification of the persons carrying on most of the maintenance, or the key developers in implementing some kind of new functionality. 152 | 153 | In some cases, the ITS carries too processes such as code review or documentation management, which have their own peculiarities. 154 | 155 | ## Code review systems 156 | 157 | Code review systems are used, as their name implies, for reviewing source code. Changes to code are organized as patches, which are submited by developers to the system, and then commented and reviewed by reviewers. Depending on the project, maybe only some developers can review, or only some can accept or decline changes. 158 | 159 | The information structure of a CRS is quite similar to that of ITS. The role of tickets is taken by proposed patches. Proposed patches go through changes in state as the review cycle progresses. Some reviewers propose to accept the change, some others may ask for a new version, finally the change is accepted or maybe abandoned. All of them (reviewers and change proposers) can write comments as well. 160 | 161 | Due to this similarity, some projects use the ITS for code review. This is for example the case of WebKit, which uses Bugzilla. But during the last years, specialized systems for code review have been adopted in most cases. The most prominent cases are Gerrit and GitHub Pull Requests. 162 | 163 | ### Example of code review in Gerrit 164 | 165 | Gerrit tracks "changes", usually in combination with git. Each change is a commit, which is reviewed by flagging it with +1 (proposal for accepting the change) or -1 (proposal for asking for a new version of the change). Each version of a change is called "patchset", and the developer is expected to submit one patchset after another until reviewers are happy with the change. In this case, the acceptance of the change is signalled by flaging it with +2. Gerrit can be tuned to the specific policies of a project, so that for example a certain number of +2 is needed to accept (merge) a change, or that only core reviewers can flag a change with +2. 166 | 167 | ![Gerrit code review](crs-openstack-example.png) 168 | *Example of code review: [Gerrit code review for OpenStack](https://review.openstack.org/#/c/191195/)* 169 | 170 | Gerrit can also be connected with testing and continuous integration systems, so that automated tests are run before and after a change is reviewed. A change submitted to Gerrit finishes its review process, possibly after several patchsets are submitted, when it is accepted (merged), or when the developer decides to withold it (abandon). 171 | 172 | ### Example of code review in GitHub 173 | 174 | In GitHub, code review is done via "pull requests", which are a kind of specialized tickets (issues). The process starts when a developer submits a change. To prepare for that, they usually have previously forked the corresponding git repository, and committed the proposed change to it. Then, that commit can be proposed for pull request via the GitHub web interface. Once it is proposed, it appears to GitHub users as a ticket, with the peculiarity that the corresponding change can be explored, commented, and merged. 175 | 176 | ![GitHub code review](crs-github-example.png) 177 | *Example of code review: [GitHub pull request](https://github.com/VizGrimoire/GrimoireLib/pull/57)* 178 | 179 | GitHub does not allow for tracking different versions of the change. In fact, different versions can be submitted, by changing the commit (eg, ammending it). But that information is lost when the corresponding ticket is browsed: only the last version of the change is available. 180 | 181 | Since a pull request is really a ticket, it can also be labeled, assigned, closed, etc. 182 | 183 | ### Notes on using information from a CRS 184 | 185 | CRS are very important when tracking processes in software development. When a project uses mandatory code review, any new piece of code has to be through the CRS, which tracks times, people involved, etc. Therefore the information mined from it can be used to learn about how long does code review lasts, and who is responsible for delays in code review: developers who fail to submit quiclly new versions of a change, or reviewers who are slow in reviewing it. 186 | 187 | People involved in review processes is also a very interesting information. Commit records only keep information about the author and maybe the person merging the commit. Code review provides a much more detailed information: all people commenting, or proposing approval or decline of chaneges are tracked. In addition, CRS usually provide links with testing and continuous integration (CI) systems, which may say a lot about how good proposed changes are. 188 | 189 | In many cases, tickets refering bug reports or feture requests corresponding to the change are linked as well. This allows for more complete analysis of all the development process. For example, that allows to track, since the moment a bug is reported to when a fix is proposed, how it evolves to the final merged change, how it finally passes the CI tests, until it is ready for deployment in production environments. 190 | 191 | ## Asynchronous communication systems 192 | 193 | In the early ages of FOSS projects, most communication was asynchronous. The most popular tools were email (using mailing lists) and USENET News (in some cases, connected to a mailing list). With time, web forums became important in some communities too, and the relevance of USENET News declined, to the point of fading away. Some special-purpose forums, such as StackOverflow or Reddit, emerged during the last years as important points of communication, even if the projects didn't decide on using them. 194 | 195 | Each of of these systems organize information in a different way, although they have some aspects in common. All of them are organized around "messages" (email messages, posts in a forum, questions in StackOverflow, etc.) which have an author, a date, and in most cases a one-line summary. In most cases messages are related in threads, usually as they reply or mention each other. 196 | But common aspects stop here. The structure of messages, how threads are organized, and other ancillary information are different from system to sytem. 197 | 198 | ## Example of ACS: email message in Mailman 199 | 200 | Mailman is a mailing list manager. You can create mailing lists, and manage them via a website. It has been one of the most usual choices for handling mailing lists during the 2000s. It provides an HTML interface, which gives access to all messages. But it also offers archives in mbox format, which is much easier to mine. The information stored in those archives includes usually most of the headers in the original messages. Below you can find an example of one of those messages, as seen via the HTML interface. 201 | 202 | ![Mailman email message](acs-mailman-example.png) 203 | *Example of asynchronous communication: [email message in Mailman](https://lists.libresoft.es/pipermail/metrics-grimoire/2015-March/002419.html)* 204 | 205 | The main fields of information can be seen in this message: the subject (a summary of the message), the sender (including email address, slightly mangled), the date, and the contents. Some other headers may be available. 206 | 207 | One important detail is about the date. In most cases, this includes at least the date as stamped by the mail server (usually, Mailman itself). But it can include as well the date of the mailer program user to send the message, usually set to the local time of the sender. This means that both analysis by local time and by universal time are possible. 208 | 209 | The contents of the message can be huge, since they may include attachments. Depending on the configuration, attachments may be available or not. 210 | 211 | Mailman deals with the list of subscribers to the mailing list, but it doesn't track its history. Therefore, the current list of subscribers can be retrieved, but it is not possible to obtain past lists for a historical analysis. 212 | 213 | Privacy sets may interfere with mining, even for public mailing lists. For example, archives can be available only for subscribers, or email addresses may be mangled. Both cases make mining a bit more complicated, or make some analysis impossible. 214 | 215 | A final comment: public mailing lists can always be subscribed to. This mean that a miner can subscribe, and produce an archive with all details present in incoming messages. The history before the subscription won't be available, fut from that point on, all details can be easily accessed. 216 | 217 | ### Notes on using information from ACS 218 | 219 | It was Apache the first FOSS community to explicitly state that "[If it didn't happen on a mailing list, it didn't happen](https://community.apache.org/newbiefaq.html#NewbieFAQ-IsthereaCodeofConductforApacheprojects?)". Since them, but also before them, many others have put in practice this principle, even without stating an specific policy for enforcing it. This means that mailing lists and forums are of great importance to track the coordination activities of projects, and how they discuss and take decisions. 220 | 221 | But archives for forums and mailing lists are not always available, or they are available only in part, or they are available in ways that are difficult to mine. For example, some projects don't keep archives of some periods of their history, or rely on systems that even when they perform archiving, are very mining-unfriendly. A notable case is Google Groups, which is used as the ACS by many projects. This system doesn't have an API for mining information, which means that web spidering is the only way to retrieve information. On the contrary, several mailing lists software (such as Mailman) and systems (such as GMane) have very good facilities for automatically retrieving archived information. 222 | 223 | Privacy of email addresses is a bit problem for mailing lists archives, but also for mailing list miners. To prevent spam, may archives mangle mailing lists, in a way that it is impossible to know the actual email address of the sender of a message. This means that identification of developers cannot be done based on email addresses, as it can be done for example for git repositories. Therefore, it is not possible to know if a certain poster is the same person that authored a commit. Some types of analysis and evaluation, which rely on this identification, can therefore not be done in those cases. An example is the analysis of how many developers participate in a mailing list. 224 | 225 | Forums have also their own problems. First of all, each forum has its own peculiarities. Some of them have APIs which make mining very simple. But some others have APIs not really designed for mining, lacking fundamental capabilities like incremental searching of posts, which makes it complex and resource-demanding to retrieve their information. In some other cases, APIs simply don't exist, and the only way to mine messages is to get database dumps for all the infomation they store, which is obviously difficult. 226 | 227 | ## Synchronous communication systems 228 | 229 | The traditional synchronous communication system of choice in FOSS projects has been IRC (Internet Relay Chat). Those are used for casual conversation, support to users, quick discussions between developers, and even for socializing. 230 | 231 | Other communication channels used by FOSS projects have been those based on XMPP, such as those provided by Jabber, and more recently, Slack. 232 | 233 | The information obtained from all these systems can be organized in a similar way. The unit of communication is the message, which can be related to its author, and to the date it was posted. In most of these systems, communication is organized in channels, and therefore messages can be related to a channel as well. 234 | 235 | ### Example of SCS: IRC log 236 | 237 | IRC channels on Freenode are one of the most popular synchronous communication channels used by FOSS projects. People participating in the channels use some software to connect to Freenode. That software lets them see all messages posted to the channel while they are connected, and send messages as well. The software also informs about who is connected to the channel, and produces notifications when new people join or leave it. 238 | 239 | To log all these interactions, usually IRC bots are used. Those are programs that connect to the channel, and record all interactions received from it. These bots produce files with those logs that the project uses to make conversations public, and to preserve them for the future. 240 | 241 | ![IRC log from an OpenStack channel](scs-irc-example.png) 242 | *Example of SCS: [Log of the #openstack-operators IRC channel](http://eavesdrop.openstack.org/irclogs/%23openstack-operators/%23openstack-operators.2015-06-10.log.html) of the OpenStack project.* 243 | 244 | The figure above shows a fraction of the log produced by one of these bots. It can be seen how both messages and join/leave notifications are recorded, and how for each of them the time is available (the date is implicit, since one log file is produced per day). The identifier for the person sending each message or entering or leaving the channel is recorded as well. 245 | 246 | In IRC servers, person identifiers can be reserved, so it is usual that frequent participants use always the same name. But this is not always the case: non registered identifiers are available for anyone to use. 247 | 248 | ### Notes on using information from SCS 249 | 250 | One of the main trouble with SCS systems is that it is not usually easy to track the identities of people participating in the channels. Nothing like email addresses is available. Although the regulars in the channels are usually identifiable, for most participants this is not the case. 251 | 252 | In some cases there are privacy concerns with logging SCS channels. Although they are usually public, to comply with the openness that most project mandate, they are not perceived by some developers as mailing lists are. Due to their immediacy, people may be inclined to forget that what they say is public, and may find themselves saying words that they wouldn't really say in public. When bots are used for recording, some people may be reluctant to participate, for fearing that those cases are recorded and linked with them in public. However, although these concerns exist, public channels are really public, and most projects are dealing with them exactly as they do with public mailing lists: recording, archiving, and making archives public. 253 | 254 | SCS recordings are a good data source to check for people with high involvement in the project. Being available for participating in discussions with anyone joining the channel, or answering quick questions, are signals that can be interesting to track. 255 | 256 | ## Testing and continuous integration systems 257 | 258 | During the last years more and more projects are including an infrastructure for performing automatic testing. Continuous integration, when used by projects, usually require that those tests are run for each change, so that automatic tools can decide whether it is safe to integrate the change. When the project is using code review, automated testing is usually a part of any review cycle, to spare time to reviewers, who can focus on those changes that passed the testing process. 259 | 260 | Some of the most used systems for continuous integration in FOSS projects are Jenkins / Hudson and Travis. 261 | 262 | ## Impact on the infrastructure of the projects 263 | 264 | Mining may have a significant impact on the performance of the infrastructure of the project. In fact, some mining activities can be identified by the sysadmins for that infrastructure as a kind of DoS (denial of service) attack. But even when they are not, those activities can cause a log of stress in the project infrastructure. For example, retrieving all information from the ITS, means querying it for all tickets, and the for each ticket obtain all changes and comments. This is not only a significant effort for the ITS, but also an effort for which it was not really designed. 265 | 266 | ITS should provide quickly listings of open tickets, or recent tickets assigned to some developers, or the most recent status for a certain ticket. But the actions performed by programs retrieving information for mining are very different from this "usual behavior". 267 | 268 | There are some practices that should be followed to ensure that impact on the project infrastructure is minimum: 269 | 270 | * Design the retrieving tools as much repository-friendly as possible. For example, use delays between operations, so that the system is not "hijacked" by the program, letting users perform their tasks 271 | * Contact the project for advice in case it is anticipated that the infrastructure is going to be stressed. Project sysdamins can help to design retrieving scenarios that cause as little harm as possible. 272 | * Whenever possible, use archives with previously retrieved information. In this case, the information is retrieved only once, but can be later consumed for many different studies by different parties. If the information is reliable enough, there is no need to stress the infrastructure once and again. In addition, you don't have to retrieve the information yourself, and can concentrate in designing and performing your analysis. -------------------------------------------------------------------------------- /sqo-oss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/sqo-oss.png -------------------------------------------------------------------------------- /stackalitics-main.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/stackalitics-main.png -------------------------------------------------------------------------------- /stackalytics-mirantis.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/stackalytics-mirantis.png -------------------------------------------------------------------------------- /styles/pdf.css: -------------------------------------------------------------------------------- 1 | 2 | { 3 | font-size: 200%; 4 | } 5 | -------------------------------------------------------------------------------- /to_probe_further.md: -------------------------------------------------------------------------------- 1 | # To probe further 2 | 3 | This is an still disorganized list of random links. It will evolve hopefully in an organized list of useful resources. 4 | 5 | ## Evaluation of community growth 6 | 7 | * [Brackets Health Report](http://blog.brackets.io/2015/03/27/introducing-brackets-health-report/) 8 | * Polarsys Maturity Model 9 | * [Apache Maturity Model](https://community.apache.org/apache-way/apache-project-maturity-model.html) 10 | * Open Business Readiness Rating (OpenBRR): [website](http://www.openbrr.org/) 11 | * [Comparing OpenBRR, QSOS, and OMM Assessment Models](http://link.springer.com/chapter/10.1007/978-3-642-13244-5_18) 12 | * [Modèle de maturité OSS](http://inventarisoss.smals.be/fr/160-RCH.html), Smals Recehrche. Evolution of OpenBRR. 13 | * Qualipso Open Source Maturity Model (OMM). [Entry in Wikipedia](http://en.wikipedia.org/wiki/OpenSource_Maturity_Model), [details in Qualipso-related website](http://qualipso.icmc.usp.br/OMM/) 14 | * [Open-source software assessment methodologies](http://en.wikipedia.org/wiki/Open-source_software_assessment_methodologies), article in Wikipedia. 15 | ## Software evaluation (mainly functional, but not only): 16 | 17 | * [Software Evaluation Guide](http://www.software.ac.uk/software-evaluation-guide), [Criteria-based Software Evaluation](http://software.ac.uk/sites/default/files/SSI-SoftwareEvaluationCriteria.pdf) and [Tutorial-based Software Evaluation](http://software.ac.uk/sites/default/files/SSI-SoftwareEvaluationTutorial.pdf), by the [Software Sustainability Institute](http://www.software.ac.uk). 18 | * [Quantitative Methods for Software Selection and Evaluation](http://www.sei.cmu.edu/reports/06tn026.pdf), by Michael S. Bandor, September 2006 19 | -------------------------------------------------------------------------------- /tz-scm-authors-2013.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/tz-scm-authors-2013.png -------------------------------------------------------------------------------- /tz-scm-authors-2014.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jgbarah/evaluating-foss-projects/2e9618fc68220ca1b1f6c338cbd3878ab1747482/tz-scm-authors-2014.png --------------------------------------------------------------------------------