publications
publications by categories in reversed chronological order.
2023
- JSSFinding associations between natural and computer languages: A case-study of bilingual LDA applied to the bleeping computer forum postsKundi Yao, Gustavo A. Oliva, Ahmed E. Hassan, and 3 more authorsJournal of Systems and Software, 2023
In the context of technical support, trails of technical discussions often contain a mixture of natural language (e.g., English) and software log excerpts. Uncovering latent links between certain problems and log excerpts that are often requested during the discussions of those problems enables the construction of a valuable knowledge base. Nevertheless, uncovering such latent links is challenging because English and software logs are two fundamentally different languages. In this paper, we investigate the suitability of multilingual LDA models to address the problem at hand. We study three models, namely: enriched LDA (M+), two-layer LDA (M2L), and off-the-shelf bilingual LDA (Mbi). We use approximately 8K discussion threads from a Bleeping Computer forum as our dataset. We observe that M2L performs the best overall, although it yields a substantially coarser-grained view of the discussed themes in the threads (20 topics, 0.3% of the documents). We also note that M+ outperforms Mbi achieving higher coherence, lower perplexity, and higher cross-lingual coverage ratio. We invite future studies to qualitatively assess the quality of the topics produced by the LDA models, such that the feasibility of employing such models in practice can be better determined.
2022
- SANERMining Software Information Sites to Recommend Cross-Language Analogical LibrariesKawser Wazed Nafi, Muhammad Asaduzzaman, Banani Roy, and 2 more authors2022
Software development is largely dependent on libraries to reuse existing functionalities instead of reinventing the wheel. Software developers often need to find analogical libraries (libraries similar to ones they are already familiar with) as an analogical library may offer improved or additional features. Developers also need to search for analogical libraries across programming languages when developing applications in different languages or for different platforms. However, manually searching for analogical libraries is a time-consuming and difficult task. This paper presents a technique, called XLibRec, that recommends analogical libraries across different programming languages. XLibRec collects Stack Overflow question titles containing library names, library usage information from Stack Overflow posts, and library descriptions from a third party website, Libraries.io. We generate word-vectors for each information and calculate a weight-based cosine similarity score from them to recommend analogical libraries. We performed an extensive evaluation using a large number of analogical libraries across four different programming languages. Results from our evaluation show that the proposed technique can recommend cross-language analogical libraries with great accuracy. The precision for the Top-3 recommendations ranges from 62-81% and has achieved 8-45% higher precision than the state-of-the-art technique.
- TSEA Study of Bug Management Using the Stack Exchange Question and Answering PlatformAaditya Bhatia, Shaowei Wang, Muhammad Asaduzzaman, and 1 more authorIEEE Transactions on Software Engineering, 2022
Traditional bug management systems, like Bugzilla, are widely used in open source and commercial projects. Stack Exchange uses its online question and answer (Q&A) platform to collect and manage bugs, which brings several new unique features that are not offered in traditional bug management systems. Users can edit bug reports, use different communication channels, and vote on bug reports, answers, and their associated comments. Understanding how these features manage bug reports can provide insights to the designers of traditional bug management systems, like whether a feature should be introduced? and how would users leverage such a feature? We performed a large-scale analysis of 19,151 bug reports of the bug management system of Stack Exchange and studied the in-place editing, the answering and commenting, and the voting features. We find that: 1) The three features are used actively. 2) 57 percent of the edits improved the quality of bug reports. 3) Commenting provides a channel for discussing bug-related information, while answering offers a channel for explaining the causes of a bug and bug-fix information. 4) Downvotes are made due to the disagreement of the reported “bug” being a real bug and the low quality of bug reports. Based on our findings, we provide suggestions for traditional bug management systems.
2021
- ICSECOSTER: A Tool for Finding Fully Qualified Names of API Elements in Online Code SnippetsC M Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal K. Roy2021
Code snippets available on question answering sites (e.g., Stack Overflow) are a great source of information for learning how to use APIs. However, it is difficult to determine which APIs are discussed in those code snippets because they often suffer from declaration ambiguities and missing external references. In this paper, we introduce COSTER, a context-sensitive type solver that can determine the fully qualified names (FQNs) of API elements in those code snippets. The tool uses three different similarity measures to rank potential FQNs of a query API element. Results from our quantitative evaluation and user study demonstrate that the proposed tool can not only recommend FQNs of API elements with great accuracy but can also help developers to reuse online code snippets by suggesting the required import statements.
2020
- SANERExploring Type Inference Techniques of Dynamically Typed LanguagesC. M. Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal K. Roy2020
Developers often prefer dynamically typed programming languages, such as JavaScript, because such languages do not require explicit type declarations. However, such a feature hinders software engineering tasks, such as code completion, type related bug fixes and so on. Deep learning-based techniques are proposed in the literature to infer the types of code elements in JavaScript snippets. These techniques are computationally expensive. While several type inference techniques have been developed to detect types in code snippets written in statically typed languages, it is not clear how effective those techniques are for inferring types in dynamically typed languages, such as JavaScript. In this paper, we investigate the type inference techniques of JavaScript to understand the above two issues further. While doing that we propose a new technique that considers the locally specific code tokens as the context to infer the types of code elements. The evaluation result shows that the proposed technique is 20-47% more accurate than the statically typed language-based techniques and 5–14 times faster than the deep learning techniques without sacrificing accuracy. Our analysis of sensitivity, overlapping of predicted types and the number of training examples justify the importance of our technique.
- EMSECAPS: a supervised technique for classifying Stack Overflow posts concerning API issuesMd Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K. Roy, and 1 more author2020
The design and maintenance of APIs (Application Programming Interfaces) are complex tasks due to the constantly changing requirements of their users. Despite the efforts of their designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), have become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features collected from posts, the experience of users, readability metrics and centrality measures of collaboration network to build a technique, called CAPS, that can classify SO posts concerning API issues. In total, we consider 34 features along eight different dimensions. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We then conduct studies to find important features and also evaluate the performance of the CRF-based technique for classifying issue sentences. Comparison with two other baseline approaches shows that the technique has high potential. We also test the generalizability of CAPS results, evaluate the effectiveness of different classifiers, and identify the impact of different feature sets.
2019
- ASELearning from Examples to Find Fully Qualified Names of API Elements in Code SnippetsC M Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal K. Roy2019
Developers often reuse code snippets from online forums, such as Stack Overflow, to learn API usages of software frameworks or libraries. These code snippets often contain ambiguous undeclared external references. Such external references make it difficult to learn and use those APIs correctly. In particular, reusing code snippets containing such ambiguous undeclared external references requires significant manual efforts and expertise to resolve them. Manually resolving fully qualified names (FQN) of API elements is a non-trivial task. In this paper, we propose a novel context-sensitive technique, called COSTER, to resolve FQNs of API elements in such code snippets. The proposed technique collects locally specific source code elements as well as globally related tokens as the context of FQNs, calculates likelihood scores, and builds an occurrence likelihood dictionary (OLD). Given an API element as a query, COSTER captures the context of the query API element, matches that with the FQNs of API elements stored in the OLD, and rank those matched FQNs leveraging three different scores: likelihood, context similarity, and name similarity scores. Evaluation with more than 600K code examples collected from GitHub and two different Stack Overflow datasets shows that our proposed technique improves precision by 4-6% and recall by 3-22% compared to state-of-the-art techniques. The proposed technique significantly reduces the training time compared to the StatType, a state-of-the-art technique, without sacrificing accuracy. Extensive analyses on results demonstrate the robustness of the proposed technique.
2018
- SANERClassifying stack overflow posts on API issuesMd Ahasanuzzaman, Muhammad Asaduzzaman, Chanchal K. Roy, and 1 more author2018
The design and maintenance of APIs are complex tasks due to the constantly changing requirements of its users. Despite the efforts of its designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), has become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features of posts and experience of users to build a technique, called CAPS, that can classify SO posts concerning API issues. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We also conduct studies to test the generalizability of CAPS results and to understand the effects of different sources of information on it.
2017
- ICSMERecommending Framework Extension ExamplesMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more author2017
The use of software frameworks enables the delivery of common functionality but with significantly less effort than when developing from scratch. To meet application specific requirements, the behavior of a framework needs to be customized via extension points. A common way of customizing framework behavior is by passing a framework related object as an argument to an API call. Such an object can be created by subclassing an existing framework class or interface, or by directly customizing an existing framework object. However, to do this effectively requires developers to have extensive knowledge of the framework’s extension points and their interactions. To aid the developers in this regard, we propose and evaluate a graph mining approach for extension point management. Specifically, we propose a taxonomy of extension patterns to categorize the various ways an extension point has been used in the code examples. Our approach mines a large amount of code examples to discover all extension points and patterns for each framework class. Given a framework class that is being used, our approach aids the developer by following a two-step recommendation process. First, it recommends all the extension points that are available in the class. Once the developer chooses an extension point, our approach then discovers all of its usage patterns and recommends the best code examples for each pattern. Using five frameworks, we evaluate the performance of our two-step recommendation, in terms of precision, recall, and F-measure. We also report several statistics related to framework extension points.
- ASEFEMIR: A tool for recommending framework extension examplesMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more authorOct 2017
Software frameworks enable developers to reuse existing well tested functionalities instead of taking the burden of implementing everything from scratch. However, to meet application specific requirements, the frameworks need to be customized via extension points. This is often done by passing a framework related object as an argument to an API call. To enable such customizations, the object can be created by extending a framework class, implementing an interface, or changing the properties of the object via API calls. However, it is both a common and non-trivial task to find all the details related to the customizations. In this paper, we present a tool, called FEMIR, that utilizes partial program analysis and graph mining technique to detect, group, and rank framework extension examples. The tool extends existing code completion infrastructure to inform developers about customization choices, enabling them to browse through extension points of a framework, and frequent usages of each point in terms of code examples. A video demo is made available at https://asaduzzamanparvez.wordpress.com/femir.
2016
- J. Softw. E. P.A Simple, Efficient, Context-Sensitive Approach for Code CompletionMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more authorJ. Softw. Evol. Process, Jul 2016
Code completion helps developers use application programming interfaces APIs and frees them from remembering every detail. In this paper, we first describe a novel technique called Context-sensitive Code Completion CSCC for improving the performance of API method call completion. CSCC is context sensitive in that it uses new sources of information as the context of a target method call. CSCC indexes method calls in code examples by their context. To recommend completion proposals, CSCC ranks candidate methods by the similarities between their contexts and the context of the target call. Evaluation using a set of subject systems and five popular state-of-the-art techniques suggests that CSCC performs better than existing type or example-based code completion systems. We conduct experiments to find how different contextual elements of the target call benefit CSCC. Next, we investigate the adaptability of the technique to support another form of code completion, i.e., field completion. Evaluation with eight different subject systems suggests that CSCC can easily support field completion with high accuracy. Finally, we compare CSCC with four popular statistical language models that support code completion. Results of statistical tests from our study suggest that CSCC not only outperforms those techniques that are based on token level language models, but also in most cases performs better or equally well with GraLan, the state-of-the-art graph-based language model. Copyright © 2016 John Wiley & Sons, Ltd.
- MSRHow Developers Use Exception Handling in Java?Muhammad Asaduzzaman, Muhammad Ahasanuzzaman, Chanchal K. Roy, and 1 more authorMay 2016
Exception handling is a technique that addresses exceptional conditions in applications, allowing the normal flow of execution to continue in the event of an exception and/or to report on such events. Although exception handling techniques, features and bad coding practices have been discussed both in developer communities and in the literature, there is a marked lack of empirical evidence on how developers use exception handling in practice. In this paper we use the Boa language and infrastructure to analyze 274k open source Java projects in GitHub to discover how developers use exception handling. We not only consider various exception handling features but also explore bad coding practices and their relation to the experience of developers. Our results provide some interesting insights. For example, we found that bad exception handling coding practices are common in open source Java projects and regardless of experience all developers use bad exception handling coding practices.
2015
- ICSMEExploring API method parameter recommendationsMuhammad Asaduzzaman, Chanchal K. Roy, Samiul Monir, and 1 more authorSep 2015
A number of techniques have been developed that support method call completion. However, there has been little research on the problem of method parameter completion. In this paper, we first present a study that helps us to understand how developers complete method parameters. Based on our observations, we developed a recommendation technique, called Parc, that collects parameter usage context using a source code localness property that suggests that developers tend to collocate related code fragments. Parc uses previous code examples together with contextual and static type analysis to recommend method parameters. Evaluating our technique against the only available state-of-the-art tool using a number of subject systems and different Java libraries shows that our approach has potential. We also explore the parameter recommendation support provided by the Eclipse Java Development Tools (JDT). Finally, we discuss limitations of our proposed technique and outline future research directions.
- ICSMEPARC: Recommending API methods parametersMuhammad Asaduzzaman, Chanchal K. Roy, and Kevin A. SchneiderSep 2015
APIs have grown considerably in size. To free developers from remembering every detail of an API, code completion has become an integral part of modern IDEs. Most work on code completion targets completing API method calls and leaves the task of completing method parameters to the developers. However, parameter completion is also a non-trivial task. We present an Eclipse plugin, called PARC, that supports automatic completion of API method parameters. The tool is based on the localness property of source code, which states that developers tend to put related code fragments close together. PARC combines contextual and static type analysis to support a wide range of parameter expression types.
2014
- ICSMECSCC: Simple, Efficient, Context Sensitive Code CompletionMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more authorSep 2014
Code Completion helps developers learn APIs and frees them from remembering every detail. In this paper, we describe a novel technique called CSCC (Context Sensitive Code Completion) for improving the performance of API method call completion. CSCC is context sensitive in that it uses new sources of information as the context of a target method call. CSCC indexes method calls in code examples by their contexts. To recommend completion proposals, CSCC ranks candidate methods by the similarities between their contexts and the context of the target call. Evaluation using a set of subject systems and five popular state of-the-art techniques suggests that CSCC performs better than existing type or example-based code completion systems. We also investigate how the different contextual elements of the target call benefit CSCC.
- ICSMEContext-Sensitive Code Completion Tool for Better API UsabilityMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more authorIn 2014 IEEE International Conference on Software Maintenance and Evolution, Sep 2014
Developers depend on APIs of frameworks and libraries to support the development process. Due to the large number of existing APIs, it is difficult to learn, remember, and use them during the development of a software. To mitigate the problem, modern integrated development environments provide code completion facilities that free developers from remembering every detail. In this paper, we introduce CSCC, a simple, efficient context-sensitive code completion tool that leverages previous code examples to support method completion. Compared to other existing code completion tools, CSCC uses new sources of contextual information together with lightweight source code analysis to better recommend API method calls.
2013
- ICSMELHDiff: A Language-Independent Hybrid Approach for Tracking Source Code LinesMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more authorSep 2013
Tracking source code lines between two different versions of a file is a fundamental step for solving a number of important problems in software maintenance such as locating bug introducing changes, tracking code fragments or defects across versions, merging file versions, and software evolution analysis. Although a number of such approaches are available in the literature, their performance is sensitive to the kind and degree of source code changes. There is also a marked lack of study on the effect of change types on source location tracking techniques. In this paper, we propose a language-independent technique, LHDiff, for tracking source code lines across versions that leverages simhash technique together with heuristics to improve accuracy. We evaluate our approach against state-of-the- art techniques using benchmarks containing different degrees of changes where files are selected from real world applications. We further evaluate LHDiff with other techniques using a mutation based analysis to understand how different types of changes affect their performance. The results reveal that our technique is more effective than language-independent approaches and no worse than some language-dependent techniques. In our study LHDiff even shows better performance than a state-of-the-art language- dependent approach. In addition, we also discuss limitations of different line tracking techniques including ours and propose future research directions.
- ICSMELHDiff: Tracking Source Code Lines to Support Software Maintenance ActivitiesMuhammad Asaduzzaman, Chanchal K. Roy, Kevin A. Schneider, and 1 more authorSep 2013
Tracking lines across versions of a file is a necessary step for solving a number of problems during software development and maintenance. Examples include, but are not limited to, locating bug-inducing changes, tracking code fragments or vulnerable instructions across versions, co-change analysis, merging file versions, reviewing source code changes, and software evolution analysis. In this tool demonstration, we present a language-independent line-level location tracker, named LHDiff, that can be used to track lines and analyze changes in various kinds of software artifacts, ranging from source code to arbitrary text files. The tool can effectively detect changed or moved lines across versions of a file, has the ability to detect line splits, and can easily be integrated with existing version control systems. It overcomes the limitations of existing language-independent techniques and is even comparable to tools that are language dependent. In addition to describing the tool, we also describe its effectiveness in analyzing source code artifacts.
- MSRAnswering questions about unanswered questions of Stack OverflowMuhammad Asaduzzaman, Ahmed Shah Mashiyat, Chanchal K. Roy, and 1 more authorMay 2013
Community-based question-answering services accumulate large volumes of knowledge through the voluntary services of people across the globe. Stack Overflow is an example of such a service that targets developers and software engineers. In general, questions in Stack Overflow are answered in a very short time. However, we found that the number of unanswered questions has increased significantly in the past two years. Understanding why questions remain unanswered can help information seekers improve the quality of their questions, increase their chances of getting answers, and better decide when to use Stack Overflow services. In this paper, we mine data on unanswered questions from Stack Overflow. We then conduct a qualitative study to categorize unanswered questions, which reveals characteristics that would be difficult to find otherwise. Finally, we conduct an experiment to determine whether we can predict how long a question will remain unanswered in Stack Overflow.
2012
- MSRBug introducing changes: A case study with AndroidMuhammad Asaduzzaman, Michael C. Bullock, Chanchal K. Roy, and 1 more authorJun 2012
Changes, a rather inevitable part of software development can cause maintenance implications if they introduce bugs into the system. By isolating and characterizing these bug introducing changes it is possible to uncover potential risky source code entities or issues that produce bugs. In this paper, we mine the bug introducing changes in the Android platform by mapping bug reports to the changes that introduced the bugs. We then use the change information to look for both potential problematic parts and dynamics in development that can cause maintenance implications. We believe that the results of our study can help better manage Android software development.
2011
- IWSCVisCad: Flexible Code Clone Analysis Support for NiCadMuhammad Asaduzzaman, Chanchal K. Roy, and Kevin A. SchneiderIn Proceedings of the 5th International Workshop on Software Clones, Jun 2011
Clone detector results can be better understood with tools that support visualization and facilitate in-depth analysis. In this tool demo paper we present VisCad, a comprehensive code clone analysis and visualization tool that provides such support for the near-miss hybrid clone detection tool, NiCad. Through carefully selectedmetrics and visualization techniques VisCad can guide users to explore the cloning of a system from different perspectives.
- ICECCSAnalyzing and Forecasting Near-Miss Clones in Evolving Software: An Empirical StudyMinhaz F. Zibran, Ripon K. Saha, Muhammad Asaduzzaman, and 1 more authorIn 2011 16th IEEE International Conference on Engineering of Complex Computer Systems, Apr 2011
Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in code-bases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and near-miss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.
2010
- SCAMEvaluating Code Clone Genealogies at Release Level: An Empirical StudyRipon K. Saha, Muhammad Asaduzzaman, Minhaz F. Zibran, and 2 more authorsIn 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation, Sep 2010
Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.