ε-domination & Optimization   Oct'17 - May'19 (Raise Lab, NCSU)

  • On the Nature of Software Engineering. Implications of ε-domination in Software Engineering 

  • Predictors have high uncertainty ε > 0 suggesting that rather than strive to make ε = 0, used ε > 0 to explore goal space & have reduced number of evaluations. 

  • Assumed ~85,000 combinations of algorithm and its parameters acting as decision variables, but performance plateaus in only 10s of combinations when considered ε > 0. Devised Tabu “looking” Search, called DODGE, for the optimization. [Link]

Defect Prediction & Class Imbalance & Optimization   Jan'17 - Aug'17 (Raise Lab, NCSU)

  • Is "Better Data" better than "Better Data Miners"? Benefits of Tuning SMOTE for Defect Prediction (Published Paper in ICSE 2018)

  • Adapted Genetic Algorithm for tuning SMOTE parameters. Compared 6 learners K-Nearest Neighbor, Support Vector Machine, Naive Bayes, Logistic Regression, Decision Tree, Random Forest for multiple goals, Recall, Precision, False alarm. 

  • Achieved statistically significant 60% and 20% improvement in AUC and recall respectively. Published and presented the work at ICSE 2018 conference. (Code on Github

TUNING & TOPIC MODELING                                                April'16 - Dec'16 (Raise Lab, NCSU)


  • What's Wrong With Topic Modeling? (And How To Fix it Using Search-based SE). (Published Paper in IST 2018 Journal)

  • Reduced “Order Effects” in Latent Dirichlet Allocation (LDA). Formulated a new metric to evaluate the cluster model stability. Implemented Differential Evolution to do hyperparameter optimization (tuning of parameters) on LDA. 

  • Showcased the new clustering framework performing better than state-of-the-art methods, with better comprehensibility. Tested on Stackoverflow, NASA Logs and Citemap. (Code on Github

SOFTWARE ENGG. & TEXT MINING                                       May'16 - Dec'16 (Raise Lab, NCSU)


  • Trends in Topics at SE Conferences: Preliminary Version. (ICSE-C, 2017)

  • Finding Trends in Software Engineering. (TSE 2018).

  • Examined the trends and patterns in Software Engineering (SE) conferences.

  • Topic Modeling (LDA) was used to define patterns. Dataset used is Citemap, which we generated.

EMAIL LABELLING & INCREMENTAL LEARNING     February'16 - April'16 (Raise Lab, NCSU)


  • Automated Online Email Categorization Tool (Python Code on Github).

  • There are various challenges like "Lack of training examples", "Compensation to explicit feedback",and many others.

  • Used SVM as a supervised learner with various featurizations like LDA, tfidf and incremental learning to extract from textual data which achieved better results. Automatic Email Labeling and can be added as an extension to current email systems.

DATA ANALYSIS & BAD SMELLS                                         APRIL'16 - MAY'16 (Raise Lab, NCSU)


  • Bad Smells of Github Project (Python Code on Github).

  • Github is an excellent online project managing and communication platform. Nowadays, most of the open-scource software developments are conducted on github. As a result, there has been an increasing need for the evaluation of software projects according to github activities. In this project, we proposed an analyser for github activities along with an early detector of bad smells based on the data of 14 group projects conducted on github from January 1st to April 7th, 2016.

APACHE SPARK & UNSUPERVISED LEARNING          January'16 - April'16 (Raise Lab, NCSU)


  • Built an Automated tool to setup an on demand Spark cluster on Apache VCL nodes with HDFS (Python Code on Github).

  • Engineered a DevOps environment by collaborating with Christopher Parnin. Used libraries like Xmlrpclib and Ansible. Code Base of 2000 LOC. 

  • Executed LDA (unsupervised learning) on En-Wikipedia (~60 GB) which terminated successfully in 17 mins on a distributed 45 nodes cluster.

  • NC State users can utilize the tool to run big data jobs. 

November'15 (North Carolina State University)

Analyzed, implemented and optimized 2 big software engineering models using few standard optimizers like NSGA-II and SPEA2. Python Deap Library was used. (Python Code on Github).

October'13 - February'14 (PES Institute Of Technology)

  • Built a robot which deciphers the path using only captured images using OpenCV without the help of any coordinate systems. The learner is trained and used to decipher. 

  • The concept behind it is that every object has unique features which can be very well determined by extracting those features and by matching them to another image. [Working Demo video Link]

​June'13 - July'13

  • Internship in an Android App Development start-up company from June – July, 2013.

  • Involved in localisation of various apps. Used various concepts like Junit Testing, de-compilation of an android app into Smali code (acts like an assembly language), dex2jar conversion.

February'13 - June'13 (PES Institute Of Technology)

  • Developed a Street Vendor Android App (MapSITE published on Google Play) which enables vendors to disseminate their business details on map.

  • The services offered are input in the form of Hindi speech and text. Hindi Language Model is built using CMU Sphinx tool. (Code on Github)