Workshop 7. Version Control with git workshop

In this workshop we will learn about branches in git repositories, how to fork code using git, and resolving conflicts. We will run everything on Shared Computing Cluster (SCC). You will fork some code written in python, fix something, and commit. You will use SCC to run this Login to the SCC.

ssh [user]@scc2.bu.edu

There are two programs in there. You will be working in teams on both programs.

Forking a repository

On Bitbucket you can fork from the left menu:

../../../_images/fork_atlassian.gif

On GitHub on the top left you can find the fork button.

../../../_images/fork_github.png

You will be divided into groups. One person from each team forks the repository.

  • Fork the repository.
  • Go to your copy of the repository.
  • Click on Send invitation and then Manage this repository.
  • Add your team members and give them Admin access.

Each team member will clone the repository on SCC.

Editing from the server

Go to the Bitbucket website, and find your repository. Go to Source, and open the Readme file. Click Edit to make changes to the Readme, and write your name. Click the Commit button to save your changes.

Running the code

Read the Readme file. You will need to have Python3 and all the required modules installed. If you don’t already have a conda environment, use:

module load anaconda2
conda create --name [env_name] python=3.6.2
source activate [env_name]

We are going to use Python3, so make sure you create an environment accordingly. You can check your Python version using:

python -V

We will need to install some modules in order to run the code.

# install the required libraries
conda install scikit-learn
conda install matplotlib
pip install textblob
python -m textblob.download_corpora

You can run the code now and play around.

python src/digit_recognition_game.py
python src/predict_sentiment.py

Untracked directory

When you run the code, a log file called human_vs_machine.cvs is made, which stores information for each run. You do not want the content of your runs to be uploaded to the repository. To do so, you can make a .gitignore file in the data folder.

vim .gitignore
ls -a data

Make some changes on digit_recognition_game.py

src/digit_recognition_game.py : runs a small code to learn handwritten digits from low resolution pictures. Then it will compete with you to see who can do better!!! You will make the following changes to improve the code:

Start by entering your own name in line 2 of the src/digit_recognition_game.py file and commit your changes:

You can see which files you have changed by:

git status

and you can see the difference between the files, e.g. the lines that were changed by:

git diff

Push it to the server:

git add src/*
git commit -m "[your message]"
git push

Did some of your team members get an error message?

! [rejected]        master -> master (non-fast-forward)
error: failed to push some refs to 'https://[your_username]@bitbucket.org/[owner_repository]/bub_workshop07_git.git'
To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes before pushing again.  See the 'Note about
fast-forwards' section of 'git push --help' for details.

Resolving conflicts

If you got a conflict message, try to pull the recent changes made by others.

git pull

This will try to automatically merge the changes that do not conflict. However, if there is a conflict, you will get an error message and in the file/s the conflicts will be marked. Such as:

<<<<<<< HEAD
aaaaaaa
=======
bbbbbb
>>>>>>> 38b76457af9eba704534f7293817653888c03fc5

If you don’t want to merge and just get rid of all the changes you have made, you can use git stash. All your changed will be lost.

Try this on your own

Now let’s improve the code a bit.

  • Change A1. Allow the user to choose the learning algorithm. Currently the program supports Support VEctor Machines (SVM), Naive Bayes (NB), and K-nearest neighbors (KNB) classifiers. Prompt the user a number 1-3 to pick the classifier.

    def set_classifier(clf='KNN'):
        """ Set the type of classifier to use"""
    
        # Define ML classifier algorithms we are going to test out
        if classifier == "SVM":
            classifier = svm.SVC(kernel="linear")  #support vector machine()
        elif classifier == "KNN":
            classifier = neighbors.KNeighborsClassifier(9)  #K Nearest-Neighbors
        elif classifier == "NB":
            classifier = naive_bayes.GaussianNB()  #Naive Bayes
        else:
            classifier = neighbors.KNeighborsClassifier(9)  #K Nearest-Neighbors
    
        return classifier
    
  • Change A2. check that the user enters a digit between 0 to 9. If the input is not a one digit number, warn the user and prompt for another number.

    def get_human_prediction():
      """
      Function: Prompts the user for the number they are guessing.
      Returns: (int) number user guessed
      """
      human_prediction = None
      while human_prediction == None:
         try:
             human_prediction = input("Type the number your saw: ")
             human_prediction = int(human_prediction)
         except:  #except all errors and reset the variable so the user can be prompted again
             human_prediction = None
      return human_prediction
    
  • Change A3. As you can see the image of the figure opens in a large size. Can you change this so it opens in a smaller size?

    fig, ax = plt.subplots(figsize=(3, 3))
    

Try to push and resolve your conflicts again.

Revert changes (undoing the commit)

git reset HEAD

You can do a --soft or a --hard reset. Oh no, did all your changes disappear? We can move back and forward with git. Get the ID of any commit and you can time travel.

git reset d656972

Branching and merging

You can make branches to work separately on different functionalities of a tools. This is useful for big teams of developers where each one works on a different module. This is how you make a branch:

# make a branch for your team
git branch [your_branch]
git checkout [your_branch]

Or you could make a branch and checkout at the same time.

git checkout -b [your_branch]

See what branch you are on:

git branch

You have your own local copy on a separate branch.

Make some changes on predict_sentiment.py

src/predict_sentiment.py runs a small code to learn the sentiment (positive or negative) of a sentence from a set of training sentences and tests on a another set. Make the following changes:

  • Change B1. The train and test sentences are currently hardcoded in the code. Save them into two text files train.txt and test.txt file and make the program read the data from the disk.

    def load_data_from_csv(filename):
    """
    Load data from a 2 column CSV file
    The data should have the a sentence in column 1
    and the sentiment "positive" or "negative" in column 2
    """
    f = open(filename,'r',encoding='latin-1')
    data = []
    for line in f:
       line = line.strip()
       sentence, sentiment = line.split(',')
       data.append( (sentence,sentiment) )
    return data
    
  • Change B2. After learning the sentiments, make the program prompt sentences from the user and guess the sentiment.

    input_sentence  = input("Type a new sentence: ")
    new_sentence = TextBlob(input_sentence,classifier=cl)
    print ("New sentence: %s" % new_sentence)
    print ("Predicted Connotation: %s" % new_sentence.classify())
    

Push your changes on your own branch. There should be no conflicts.

Merge the branch into master

git checkout master
git merge [your_branch]

Hopefully you won’t have conflicts. If you do, you know how to solve it.

Pull requests

You can inform other’s of you magnificent changes and accomplishments by making pull requests. This way you let everyone know that you made some changes and they need to pull.

Go to the repository, from the left side menu click on Pull requests. Create a new pull request. Note: It is better to send pull requests on branches, the changes you have been making.