ML Environment

From CCN Wiki
Jump to navigation Jump to search

The lab primarily uses a Linux environment. We have several workstations running the latest version of Ubuntu. Our machine learning (ML) research is primarily carried out using Google's TensorFlow libraries, written for Python. Workstations with powerful GPUs can use CUDA for GPU acceleration.

Using the CCN Lab ML Environment

To avoid clashes between different releases of Python, Tensorflow and whatever libraries are required for a particular ML application, we'll be using Virtual Environments. We have a Python 2.9 virtual environment, which is like a sandbox that all our ML libraries can be installed into and toggled on/off without disrupting anything else. There is a one-time setup procedure every individual user will have to complete:

  1. ssh or open a terminal on ws03
  2. run the following command:
    conda init bash
  3. log out, log back in. Now your shell prompt will look like:
    (base) user@host:~$
    • To toggle it on:
      conda activate /data/tf
    • To toggle it off:
      conda deactivate
    • To list all other environments that you can activate
      conda info --envs
    • To list all the packages installed in your current environment:
      conda list
  4. If conda list output doesn't include a bunch of tensorflow libraries, install it using pip:
    pip install tensorflow
    • Other libraries may have to be installed separately by each user using pip. For example:
      pip install sklearn

Setting up your own ML environment

Below are the specs and walkthrough for setting up a modest ML programming environment.

Hardware

There are no particular hardware requirements for running TensorFlow. Obviously, more disk space and RAM are desirable, as is a CUDA-enabled GPU.

Operating System

The closed Apple ecosystem might make MacOS a good choice of platform because a Mac is a Mac is a Mac. However, I'm not Mr. Moneybags over here, and Apple drops support for older hardware after a time. You couldn't pay me to use Windows for anything but Office applications, so that leaves Linux. The TensorFlow installation directions assume Ubuntu 16.04 or higher, and though I have dabbled with Debian Linux for a home media server, I typically use Ubuntu. These instructions assume Ubuntu 22.04 LTS.

Python Version

Ubuntu 22.04 has retired Python 2.x, and uses Python 3.9 by default, which works nicely with TensorFlow (I understand that getting TensorFlow to work with Python 3.10+ includes some challenges). These instructions assume Python 3.9. If you want to use other versions of Python for other applications, I recommend using virtual environments to manage and switch between Python versions. The pip installation instructions use miniconda to create a Python 3.9 virtual environment.

Setup

TensorFlow Installation

First off, you can use Conda, but don't use Conda to install TensorFlow. Instead, follow the pip install instructions published by the TensorFlow people. The steps are broadly described below:

Create a virtual environment

The first step is to install miniconda if it isn't already installed. Then, create a new Python 3.9 virtual environment:

conda create --name tf python=3.9

Then activate your new environment:

conda activate tf

Update pip and install tensorflow

You'll need to update pip first:

pip install --upgrade pip

Then pip install tensorflow:

pip install tensorflow

Spyder Installation

Anthony got me using the Spyder IDE for Python programming. Problem is, as of today (July 6, 2022), it's buggy on Ubuntu 22.04. You can install it using pip, as documented here:

pip install -U spyder

However, when I uninstalled the apt package dependencies (there are LOTS), the pip installation no longer worked, so I reinstalled the apt package as well. The apt package binary still won't launch, but at least the pip version will have everything it needs. It's possible that having the apt package installed prior to the pip installation caused pip to not bother installing some critical components. If that's the case, then it may not be strictly necessary to have both the apt and pip versions installed.

Note that pip will have installed spyder into a specific virtual environment (e.g., tf). You won't be able to launch spyder without first activating that virtual environment. After installation, you can launch it from a terminal window:

(base) user@host:~ conda activate tf
(tf) user@host:~ spyder