Page tree
Skip to end of metadata
Go to start of metadata

Project Name

  • Proposed name for the project: SEPAN (Software dEfect Prediction Attention Network)

Project description

  • The software provides a way of predicting which commits to a repository carry a high risk of having a bug. The prediction is made based on a deep-learning algorithm that uses attention networks as the main component.


  • The project addresses the area of software development, with an initial focus on software written in the Python programming language
  • The project provides a baseline component that could be incorporated in a tool that could potentially advise developers before actually making a commit regarding the likelyhood that the commit to be made might include a bug. The project would, at least initially, provide only the baseline deep-learning-based component that is able to make such prediction and not the overall integration with developer tools.
  • There are no upstream sources. The current training of the deep learning neural network is based on data obtained from the Openstack repository.
  • We foresee no local testing and integration that would need to be performed on Nordix infrastructure. SEPAN only asks for Github repository space.
  • The project is extensible in a number of ways:
    • the initial commit will support only Python as a programming language - one way of extending it would be to support additional programming languages. For example, a well-known repository for a Java-based Linux Foundation application could be used for training a version of the software that would support the Java programming language
    • another avenue for extensibility is to add new deep-leaning algorithms as part of the framework - this enables benchmarking the new algorithms against the old trained networks and thus provide for accurate means of evaluating the improvements. Some examples in this respect are included in (insert_link_the_MSc_thesis_when_published_by_KTH)

Infrastructure needs:

  • one (or two) Github repositories, depending on how the project will be structured.
    • the main Github repository contains only source code; a small amount of storage needed (100 MB would suffice for quite a while); this repository is expected to be updated more often and most contributors would bring their contributions here
    • potentially a a second Github repository will contain the data that is initially open sourced with SEPAN; a larger amount of storage is needed (10GB would be the original commit; any trained network and open source repository processed for a new programming language would need about the same space; could start with 30GB); only a few contributors would get the access rights for adding data to this repo
      • processed Openstack source code, commits and trouble tickets
      • trained deep learning neural network


  • no dependencies.

Other open source projects in this area:

Project members and contributors:

  • Commiters/maintainers: Abgeiba Isunza Navarro (KTH). Catalin Meirosu (Ericsson)
  • Contributors: Martin Monperrus (KTH)
  • No labels