- Proposed name for the project: SEPAN (Software dEfect Prediction Attention Network)
- The software provides a way of predicting which commits to a repository carry a high risk of having a bug. The prediction is made based on a deep-learning algorithm that uses attention networks as the main component.
- The project addresses the area of software development, with an initial focus on software written in the Python programming language
- The project provides a baseline component that could be incorporated in a tool that could potentially advise developers before actually making a commit regarding the likelyhood that the commit to be made might include a bug. The project would, at least initially, provide only the baseline deep-learning-based component that is able to make such prediction and not the overall integration with developer tools.
- There are no upstream sources. The current training of the deep learning neural network is based on data obtained from the Openstack repository.
- We foresee no local testing and integration that would need to be performed on Nordix infrastructure. SEPAN only asks for Github repository space.
- The project is extensible in a number of ways:
- the initial commit will support only Python as a programming language - one way of extending it would be to support additional programming languages. For example, a well-known repository for a Java-based Linux Foundation application could be used for training a version of the software that would support the Java programming language
- another avenue for extensibility is to add new deep-leaning algorithms as part of the framework - this enables benchmarking the new algorithms against the old trained networks and thus provide for accurate means of evaluating the improvements. Some examples in this respect are included in (insert_link_the_MSc_thesis_when_published_by_KTH)
- one (or two) Github repositories, depending on how the project will be structured.
- the main Github repository contains only source code; a small amount of storage needed (100 MB would suffice for quite a while); this repository is expected to be updated more often and most contributors would bring their contributions here
- potentially a a second Github repository will contain the data that is initially open sourced with SEPAN; a larger amount of storage is needed (10GB would be the original commit; any trained network and open source repository processed for a new programming language would need about the same space; could start with 30GB); only a few contributors would get the access rights for adding data to this repo
- processed Openstack source code, commits and trouble tickets
- trained deep learning neural network
Other open source projects in this area:
Project members and contributors:
- Commiters/maintainers: Abgeiba Isunza Navarro (KTH). Catalin Meirosu (Ericsson)
- Contributors: Martin Monperrus (KTH)