Skip to content
This repository was archived by the owner on Jul 18, 2024. It is now read-only.
Steve Martinelli edited this page Sep 5, 2018 · 1 revision

Short Name

Programming Language Classifier

Short Description

Classify programming languages with Watson Studio and Natural Language Classifier

Offering Type

Cognitive

Introduction

With IBM Watson Natural Language Classifier, a data scientist can build a model that looks at text documents and classifies them based on the categories used to build the model. We can use this tool to look at the contents of Github, the web-based hosting service for version control using Git, and classify code based on the programming language used. With a Jupyter notebook running on Watson Studio, the data can cleaned and manipulated, and then the Watson Developer Cloud SDK for python provides the developer with APIs to create and use a model in Watson Natural Language Classifier.

Author

By Nick Acosta

Code

Demo

  • N/A

Video

  • TBD

Overview

In this Code Pattern, we will use Jupyter Notebooks in IBM Watson Studio to build a model that predicts a code's programming language based on its text. The model will then be evaluated using IBM's Watson Natural Language classifier.

When the reader has completed this Code Pattern, they will understand how to:

Build a labeled data set. Use Watson Natural Language Classifier to create a predictive model. Build a predictive model within a Jupyter Notebook. Configure and use Watson APIs.

Flow

  1. The developer creates an IBM Watson Studio Workspace.
  2. Using Watson Studio, the developer creates a Jupyter notebook and Watson Natural Language Classifier instance.
  3. User can create a new dataset from Github, or use exsiting one in this repo.
  4. User interacts with notebook to Build Naive Bayes Classifier and Natural Language Classifier instance using the Watson Developer Cloud SDK
  5. The notebook Python code can use NLC apis to create and use a classifier.

Included components

  • Watson Studio: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
  • Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
  • Watson Natural Language Classifier: Understand the intent behind text passages though custom classifiers, complete with a confidence score.

Featured technologies

  • Artificial Intelligence: Artificial intelligence can be applied to disparate solution spaces to deliver disruptive technologies.
  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.

Blog

Blog Title

Classify Programming Languages Based on Code with Watson

Blog Author

Nick Acosta

Blog Content

The rise of Python as the go-to programming language for data scientists has made the field one of the more monoglot developer communities, to the point where, given a snippet of non-Python code, a data scientist may ask "What programming language am I even looking at?" Luckily, machine learning models can be built to perform programming language detection for data scientists.

This code pattern will go over a few such approaches, including Naive Bayes and leveraging Watson APIs to classify a program to its programming language based on its text. The data set that used was built using GitHub API's and collected from IBM's org page. Models built are then tested for accuracy. This pattern walks data scientists through an introduction to some machine learning and data engineering concepts, without him/her having to use an image classifier dataset like MNIST again! Once the pattern is finished, its Python-heavy users may even discover a newfound understanding of a different programming language!

Links

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns.
  • Data Analytics Code Patterns: Enjoyed this Code Pattern? Check out our other Data Analytics Code Patterns
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
  • Watson Studio: Master the art of data science with IBM's Watson Studio