Welcome to the Merobase Data SetsThis page is a source of data that we have collected in our research in large scale analysis of source code. These data sets are available for other researchers and individuals to use. The data is provided as-is. Please refer to the terms of usage that come with each data set for any restrictions in usage. Currently available data sets:
All these files -- together with 689,214 files from the open web available via http -- are part of the currently available (December 2012) search index of merobase.com. The downloadable Lucene Index will result in a download of approx. 49GB. Tool support:
Calculator ( add(int,int):int; ) lang:java Citation PolicyIf you publish material based on data sets obtained from this repository, then, in your acknowledgments, please note the assistance you received by using this repository. This will help others to obtain the same data sets and replicate your experiments. The premier reference for our work is: Here is a BiBTeX citation as well: @inproceedings{JHS2013, Title = {An Unabridged Source Code Dataset for Research in Software Reuse}, Author = {Janjic, Werner and Hummel, Oliver and Schumacher, Marcus and Atkinson, Colin}, Booktitle = {Proceedings of the Tenth International Workshop on Mining Software Repositories (MSR'13)}, Address = {San Francisco, CA, USA}, Organization = {IEEE Press}, Pages = {339--342}, Year = {2013} } Alternatively you may cite the following publication as well: Here is a BiBTeX citation as well: @ARTICLE{Hummel+Janjic+Atkinson:2008, author="O. Hummel and W. Janjic and C. Atkinson", journal="IEEE Software", title="Code Conjurer: Pulling Reusable Software out of Thin Air", year="2008", month="Sept.--Oct.", volume="25", number="5", pages="45--52", doi="10.1109/MS.2008.110", ISSN="0740-7459" } |
For the user's convenience, this page is based on the template of the Sourcerer Data Set web page.
|
(c) the Software-Engineering group at the University of Mannheim, Germany. |