Repository Identification and Mining Techniques

Repository Identification

Identifying blockchain software repository

To ensure valid results, we only surveyed BCS developers with sufficient experience. We identified 145 BCS projects based on following four criteria:

  • Tagged under at least one of the following six ‘topics’:2 blockchain, cryptocurrency, altcoin, ethereum, bitcoin,and smart-contracts.
  • ‘Starred’ by at least ten users.
  • Have at least five distinct contributors.
  • A manual verification of the repository confirmed it as a BCS project.

We used Github API to identify 1,604 contributors, each of whom had submitted at least five changes to one of those 145 projects. We mine the Git commit logs of the identified 145 projects to gather the email addresses of those 1,604 active contributors. We also got the survey questions, consent form, participant selection strategy, solicitation email, and data management reviewed and approved by our university’s Institutional Review Board (IRB).

We also collect the developer demographics data from the 2018 Stack Overflow Annual Developer Survey (referred as the ‘SO Survey’ hereinafter) (Overflow 2017). StackOver-flow has been running the annual developer survey since 2011. The primary objective of the SO Surveys are to learn who contemporary developers are and what they need. These surveys cover a wide range of developer demographics such as age, education, location, gender, role, experience, ethnicity, and favorite technologies. Since the ‘2018 SO survey’ was responded by total 98,855 developers from 183 countries worldwide, it is an accu-rate overview of the active software developers worldwide. Therefore, a comparison against the demographics of the SO Survey will enable us to identify if the BCS community is underrepresented or overrepresented by certain groups.