Seeq Data Lab

 

What is Seeq Data Lab?

Seeq Data Lab is a service that brings together Jupyter notebooks, which is a web application that combines a Python REPL and Markdown, with the Seeq API. This enables users to create Python-based Notebooks and use the custom SPy library that we distribute, to push, pull and manipulate data using Seeq and then develop reports and visualizations using that data.

Frequently Asked Questions

Why use Seeq Data Lab vs. Seeq Workbench/Organizer?

Seeq Data Lab offers the ability to automate work ordinarily done in Seeq Workbench/Organizer. This could include scaling an analysis across a fleet of similar assets or creating dashboards for an entire unit, as examples. It also enables unique ways of visualizing your data – such as Radar Charts or Box-and-Whisker plots – that are not otherwise available in Seeq Workbench.

How do I get Seeq Data Lab?

The best way to get Seeq Data Lab is by utilizing Seeq’s Software-as-a-Service (SaaS) offering. All of the technical details of installation and maintenance are done for you, and you will benefit from scalability features that will only available via Seeq SaaS. If you are not yet taking advantage of Seeq SaaS, please contact us to start a conversation.

If Seeq SaaS is not yet an option, we offer Seeq Data Lab Server – a single-machine configuration that is suitable for on-premise or private-cloud use. Note that Seeq Data Lab Server is only supported on Linux, and only on the Ubuntu and Red Hat Enterprise Linux distributions.

Can Seeq Data Lab Server and Seeq Server be installed on the same hardware/VM?

No – Seeq Data Lab Server must be installed on its own hardware/VM so that it does not compete for resources with the Seeq Server. Both pieces of software assume that they have complete use of CPU, memory and disk space on the machine.

Does Seeq Data Lab adversely affect performance of Seeq Server?

Seeq Data Lab executes Python code separately from the main Seeq Server. Therefore, the Python code itself will not “compete” for CPU/memory resources with Seeq Workbench or Seeq Organizer directly. However, the SPy module accesses data through Seeq Server and has facilities to leverage the Seeq calculation engine, so there can be material impacts to Seeq Server load when Data Lab is used. You will see this impact in the Server Load percentage in the bottom-right corner of Seeq Workbench/Organizer.

Can I schedule a notebook to run automatically?

This functionality will be available in R52.

What other Python libraries can I use in Seeq Data Lab?

Any and all libraries accessible through the Seeq Data Lab Server (web or company repository) and approved in your organization’s environment.

When should I use a dedicated ML-optimized platform (such as Azure ML Studio or SageMaker) instead of Seeq Data Lab?

Seeq Data Lab is best suited for general purpose computing, including: 

  • data visualization

  • workflow automation

  • modest data science applications

  • Seeq-embeddable interactive UI applications

Seeq Data Lab does not compete with ML-optimized platforms, which are the best fit for data science applications involving:

  • high performance and distributed compute management 

  • management and versioning of datasets, models, and source code

  • modeling of hundreds of calculated features and years of data

In these cases, SPy is still recommended for usage in external platforms for easy data access and/or operationalization back to Seeq.

Can Seeq Data Lab Server be connected to multiple Seeq servers?

No, Seeq Data Lab Server can only be connected to one Seeq server.

Can a cloud-based Seeq Data Lab Server be used with an on-premise Seeq server?

If you are deploying Seeq Data Lab into your private cloud, Yes, but there must exist a private network connection between Seeq Data Lab Server and Seeq Server. That can be achieved with certain cloud provider features like Azure ExpressRoute and AWS Direct Connect.

It is not possible to use SaaS-based Seeq Data Lab with a non-SaaS-based Seeq Server. You cannot connect a Seeq SaaS Data Lab server to your on-premise or private cloud server.

Can I use the Seeq Python (SPy) library outside of Seeq Data Lab?

You can use SPy without using it directly in Data Lab, but you will still need a Data Lab license. The SPy library may be imported and used in any project that supports Python. You could use SPy inside AWS Sagemaker, Azure Machine Learning Studio, any other Jupyter Notebook-based development environment that supports Python, or any other Python environment.

If you want to use Add-on Tools in Seeq, you will also need a Data Lab license.

Installing Seeq Data Lab Server

Seeq Data Lab Server is supported on Ubuntu and Red Hat Enterprise Linux (RHEL) and relies heavily on the container platform called Docker.

Minimum Hardware Requirements

The hardware required for running Seeq Data Lab Server on Docker is highly dependent on how Data Lab Server is used. Each Data Lab project requires memory and CPU based on the number and nature of the notebooks running within it.

If you are finding that operations within notebooks are slow, are running out of memory, or are running out of disk space, you will need to increase CPU, memory and/or storage resources. Seeq Data Lab relies completely on the hardware resources that have been allocated to the virtual machine that Docker is running on.

Based on our observations, each non-resource-intensive notebook consumes about 1600 MiB of memory and 800 millicores of CPU.

As adoption of Seeq Data Lab increases, users will likely leverage the scheduled notebook execution mechanism (spy.jobs) which will increase the number of simultaneously executing notebooks over time.

For general guidance, you can use the table below but if you are finding that operations within notebooks are slow, are running out of memory, or are running out of disk space, you will need to increase CPU, memory and/or storage resources.

Simultaneously Executing Simple Notebooks

up to 10

up to 40

up to 80

up to 160

CPU Architecture

64-bit

CPU Cores

8

32

64

128

Memory

16 GB

64 GB

128 GB

256 GB

Available Storage

100 GB free disk space

500 GB free disk space

OS

Ubuntu LTS versions 18.04 - 20.04
Red Hat Enterprise Linux 7.6 - 8.4

Prerequisites

Currently, Seeq Data Lab Server is only supported on two Linux distributions - Ubuntu and Red Hat Enterprise Linux (RHEL).

To install Data Lab on Linux with Docker, everything you need is included in the seeq-data-lab-<version>-64bit-linux.tar.gz installer tarball. You can find the link to download it here: https://www.seeq.com/customer-download.

Install

Go to one of the following articles based on the version you are running:

https://seeq.atlassian.net/wiki/spaces/KB/pages/1034059842

https://seeq.atlassian.net/wiki/pages/createpage.action?spaceKey=KB&title=Install%20Seeq%20Data%20Lab%20Server%20R22.0.49.XX%20or%20Earlier&linkCreation=true&fromPageId=599064577

Upgrading

Stop the Seeq Data Lab service by issuing sudo seeq stop.

Back up your installation appropriately first, see Backup and Restoration below.

Follow the same instructions as in the Install section with the new version.

RHEL: Restore SELinux Contexts

On RHEL, before starting the service again, make sure to restore the SELinux contexts:

1 sudo restorecon -R /opt/seeq

Backup and Restoration of Seeq Data Lab

Because Seeq Data Lab interacts with its underlying storage using standard File System semantics, backups can occur while Seeq Data Lab is running; system down-time is not required.

For those installations where Data Lab is deployed on a RHEL or Ubuntu server running on a Virtual Machine, the use of the underlying VM snapshot and restoration mechanisms native to VmWare, Azure, AWS, or other cloud services works well. Backups can be scheduled and executed by the I.T. infrastructure without special handling for Seeq Data Lab. In general, Seeq recommends full-system backup/restore/DR practices when deploying Data Lab running Docker on RHEL or Ubuntu.

For those wishing to be more explicit, or for installations wishing to backup just the Seeq files specifically, backing up the contents of the seeq home directory (such as /home/seeq if the installation followed the example listed at the beginning of this article) will be sufficient and complete. Restoration of the files can be made directly into the same directory; a restart of the seeq-data-lab service would be recommended following any restoration.

To reconstitute a Seeq Data Lab when the Data Lab filesystem has been individually backed up:

  1. Restore or Deploy a fresh VM to host Seeq Data Lab following the instructions listed above for deploying Seeq Data Lab.

  2. Download and re-install the same version of Seeq using the same command option as the original installation (such as “-g /home/seeq”)

  3. Insure the seeq-data-lab service is stopped

  4. Restore the contents of /home/seeq from the backup

  5. Start the seeq-data-lab service

Using Seeq Data Lab

Once Seeq has been successfully installed, users can login and get to the familiar Seeq home page. A Data Lab Project can be accessed via the same drop down that is used to create Workbench Analyses and Organizer Topics. Here is a reference image:

Click on it and you will be see a SPy Documentation folder that contains relevant documentation with example data, showcasing functionalities in Seeq Data Lab.

This environment may look familiar to you if you have been working in Jupyter Notebooks. Open up and get started with Tutorial.ipynb or any notebook with the ipynb extension and start exploring as all your data connected to Seeq is now at your finger tips! Happy Seeqing!

Installing Python Modules

External Python modules that are needed by the end user can be installed to the Seeq Data Lab project with the standard Python module instructions.

1 !pip install <package>

You must do Kernel > Shutdown and then Kernel > Restart the notebook’s kernel after installation. (It is a common mistake to forget!)

If the installation fails to complete, please check the requirements for the specific python module package to ensure all requirements are met by the Seeq Data Lab server and/or project.

Using the Terminal

When installing, uninstalling or inspecting the Python package environment, it’s often more convenient to use a Jupyter Terminal. You can launch a terminal from the Jupyter Home Page by selecting New > Terminal. Once you’re inside the terminal, you don’t have to precede the commands with an exclamation point.

Some modules are “built-in” to Data Lab. For example, Pandas and NumPy are always available and you can inspect the version by doing:

1 pip show <package_name>

You can upgrade or downgrade these packages to a specific version. For example:

1 pip install pandas==1.1.3

It is best to ensure that no Python kernels are running as you are adding removing packages. You can see the running kernels by clicking on the Running tab on the Jupyter Home Page. You can shut everything down from there.

Packages that are specific to your project and installed to the ~/.local/lib/python3.7/site-packages folder. Sometimes pip will have trouble installing or uninstalling a package and leave you in a bad state. You can always cd to this folder and use rm -rf <folder_name> to manually remove packages.

The seeq module, which includes SPy, is also built-in. However, a new functional version may be available on PyPI that you wish to utilize. You can upgrade by executing the following command:

1 pip install -U seeq~=<major_version>.<minor_version>

Where <major_version> and <minor_version> correspond to the version numbers for Seeq Server / Seeq Data Lab. (The above example is the format for Seeq Server / Seeq Data Lab versions R50 and later. For R22 and earlier, refer to the instructions on PyPI.)

This command will override the built-in version for this project only and this version will remain even if Seeq Data Lab is upgraded. Thus, you may need to issue the above command again if the major_version changes.

You can also choose to revert to the built-in version of SPy by doing the following:

1 pip uninstall -y seeq

Installing Packages Globally (to All Projects)

In R52 and later, admins can now install Python modules for use in all projects by installing external packages into the folder /seeq/python/global-packages. The preferred way to do this is to use PYTHONUSERBASE:

1 2 export PYTHONUSERBASE=/seeq/python/global-packages pip install <package>

Make sure to restart the kernel after installation.

Python packages that are installed to /seeq/python/global-packages will be overridden by packages installed to ~/.local/lib/python3.7/site-packages.

To uninstall a global package:

1 2 export PYTHONUSERBASE=/seeq/python/global-packages pip uninstall <package>

Note that all pip commands intended to reference or include global packages must also use PYTHONUSERBASE=/seeq/python/global-packages as a prefix before the desired pip command.

1 2 3 export PYTHONUSERBASE=/seeq/python/global-packages pip show <package> pip list

In R52 and later, admins can now install Jupyter nbextensions for all projects. All notebooks configured to search for extensions in /seeq/python/global-packages folder. The preferred way to install nbextension to this folder is to use JUPYTER_CONFIG_DIR and JUPYTER_DATA_DIR.

1 2 3 4 5 6 7 # setup folder for Jupyter to install extension export JUPYTER_CONFIG_DIR=/seeq/python/global-packages export JUPYTER_DATA_DIR=/seeq/python/global-packages # install nbextension in standard way jupyter nbextension install --user --symlink --overwrite --py <package> jupyter nbextension enable --user --py <package>

Make sure to restart the kernel after installation.

Note that the safest way is to install nbextensions for all projects from packages available for all projects.

Note that exporting environment variables will affect your current terminal session. So if you need to install Project-specific packages after packages were installed globally then you need to unset PYTHONUSERBASE

 

Shared “pip.conf” file

To share pip configuration across all SDL projects, admins can create and configure a "pip.conf" file in the /seeq/python/global-packages/.config folder.

Here is an example pip.conf file:

1 2 3 [install] extra-index-url = https://pypi.example.com trusted-host = pypi.example.com

Pip config files also support a list of extra and trusted URLs. For more information, refer to the official documentation.

For private repository configuration, admins can also create a file “.netrc” with credentials in the /seeq/python/global-packages/.config folder. This will enable users to download packages without entering credentials.

Here is an example .netrc file:

1 2 3 machine pypi.example.com login your_pipy_example_com_login password your_pipy_example_com_password

Troubleshooting

Starting Seeq & updating the configuration from the Seeq Prompt fails with a permission denied error

Problem: When trying to start Seeq & updating the configuration from the Seeq Prompt the following error is seen:

1 PermissionError: [Errno 13] Permission denied: ‘/opt/seeq/seeq-data-lab/install.properties’

Solution: Whilst logged in as a user that has appropriate permission run the following and try again:

1 chown seeq:seeq /opt/seeq/seeq-data-lab/install.properties