cartesiam_logo

NanoEdge AI Studio

I. What is NanoEdge AI Library?

NanoEdge AI Library is an artificial intelligence static library developed by Cartesiam, for embedded C software running on ARM Cortex microcontroller.

When embedded on microcontrollers, it gives them the ability to easily "learn" and "understand" sensor patterns, by themselves, without the need for the user to have additional skills in Mathematics, Machine Learning, or Data science.

The NanoEdge AI static library is the code that contains an AI model (for example, as a bundle of signal treatment, machine learning model, optimally tuned hyperparameters, etc.) designed to gather knowledge incrementally during a learning phase, in order to become able to detect potential anomalous machine behaviors, and possibly predict them.

II. Purpose of NanoEdge AI Studio

NanoEdge AI Library contains a range of machine learning models, and each of those models can be optimized by tuning a range of (hyper)parameters. This results in a very large number of potential combinations (static libraries), each one being tailored for a specific application.

NanoEdge AI Studio's purpose is to select the best NanoEdge AI static library possible for your final hardware application (i.e. the piece of code that contains the most relevant machine learning model to your application, tuned with the optimal parameters), in a way that doesn't require you to have advanced skills in Mathematics, Statistics, Data Science, or Machine Learning.

Each NanoEdge AI static library is the result of the benchmark of virtually all possible combinations against the datasets given by the user (the comparison of all possible methods of learning, given the user's data).

III. Getting started

1. Running NanoEdge AI Studio for the first time

When running NanoEdge AI Studio for the first time, you will be prompted for:

  • Your proxy settings: if you're using a proxy, use the settings below, otherwise, click NO.

To connect to the licensing API: 104.28.6.103 104.28.7.103 or via URL: https://api.cryptlex.com:443

To connect to the Cartesiam API for library compilation: 40.113.111.93 or via URL: https://apidev.cartesiam.net

  • The port you want to use.

It can be changed to any port available on your machine (port 5000 by default).

  • Your license key.

If you don't know your license key, please log in to the Cryptlex licensing platform to retrieve it (https://cartesiam.cryptlex.app/auth/login). If you can't find your license key, or have lost your login credentials, please contact us at support@cartesiam.com.

If you don't have an Internet connection, offline activation is available:

  1. Choose Offline activation and enter your license key,
  2. Copy the long string of characters that appears
  3. Log into your Cryptlex dashboard (https://cartesiam.cryptlex.app/auth/login). If you don’t know your login credentials, please contact us at support@cartesiam.com.
  4. Click on you your license key, then Activations and Offline activation
  5. Click ACTIVATION, then paste the string of characters from step 2, and click Download response.
  6. In NanoEdge AI Studio, click Import file and open the downloaded .dat file (from step 5).

2. Preparing signal files

During the library selection process, NanoEdge AI Studio uses user data (input files containing signals) to test and benchmark many machine learning models and parameters. The way those input files are structured, formatted, and the way the signal were recorded is therefore very important.

i. Formatting input files properly

Signals

A dataset is a set of values where each value is associated with a variable and an observation. In our case the dataset has a tabular structure (in the form of a TXT or CSV file) where each line corresponds to an observation and each column to a variable. Each row of the dataset comes from a sampling process that involves reading values from a signal at defined intervals, generally regular, producing a series of discrete values called samples.

In summary, the lines represent the signals and the columns represent the samples (or measurements) (temperature, pressure, intensity, etc.) constituting the signal. We will use the terms signal and sample to refer to an observation and a value associated with a variable, respectively. You must use as many signals (lines) as possible. Realistically, never use fewer than 20-50 lines per file. We recommend using more than 100, the more the better.

Samples

During sampling, the signal's values are read, generally at regular intervals. Regardless of the final application and the data set constituted by the user, we recommend setting a constant time interval between each sample, i.e. keeping a constant sampling frequency during data acquisition. This is an important point which can directly affect results. The number of samples per signal (also referred to as "buffer size") is set by the user, however we impose that this number remains constant from one signal to another, i.e. the number of columns in the input file that represents the dataset must be constant from one line the other.

You need enough samples to cover the whole physical phenomenon studied, but not too many in order to preserve memory (see next section). We recommend using buffer sizes (i.e. number of samples, or values per line) that are powers of two (e.g. 256, 1024, etc), and sampling as many values as possible (realistically more than 128), the more the better.

The accepted format is as follows: each signal is separated by a new line, each sample making up the line is separated by a separator. The accepted separators are: space (), comma (,) and semicolon (;). It is expected to be uniform throughout the whole file.

It is crucial to format decimal values using a period (.) and not commas (,).

Example

Here is an example of signal file corresponding to a 3-axis sensor, e.g. a collection on m signals (m readings, or lines) on a 3-axis accelerometer with a buffer size of 256 on each axis, where each numerical value is separated by a single space:

input_example

Data repetition

A common issue during data acquisition is data repetition. It generally happens when values are read faster than they are written by the sensor. For instance, let's consider an accelerometer that stores acceleration values in a buffer. If, during data acquisition, the buffer is read faster than it is filled by the accelerometer, then some data will be repeated.

To overcome this issue, you must ensure that two successive values do not repeat, by a simple test and / or by setting a timer. Note that when importing a signals file to NanoEdge AI Studio, this type of error will be tested for, and reported to the user.

In the following section we clarify the concepts of sampling frequencies, number of samples and discuss how to choose those parameters.

ii. Choosing a relevant sampling frequency

The sampling frequency corresponds to the number of samples measured per second. For some sensors, the sampling frequency can be directly set by the user, but in other cases, a timer needs to be set up for constant time intervals between each sample.

The speed at which the samples are taken must allow the signal to be accurately described, or "reconstructed"; the sampling frequency must be high enough to account for the rapid variations of the signal. The question of choosing the sampling frequency therefore naturally arises:

  • if the sampling frequency is too low, the readings will be too far apart; if the signal contains relevant features between two samples, they will be lost.
  • if the sampling frequency is too high, it may negatively impact the the costs, in terms of processing power, transmission capacity, storage space, etc.

To choose the sampling frequency, prior knowledge of the signal is useful in order to know its maximum frequency component. Indeed, to accurately reconstruct an output signal from an input signal, the sampling frequency should be at least twice as high as the maximum frequency contained in the input signal. In the case where the user has no prior knowledge of the signal, we therefore recommend testing several sampling frequencies and refining the latter according to the results obtained via NanoEdge AI Studio / Library (e.g. 200 Hz, 500 Hz, 1000 Hz, etc.).

The issues related to the choice of sampling frequency and the number of samples are illustrated below:

Case 1: the sampling frequency and the number of samples make it possible to reproduce the variations of the signal.

sampling-freq-1

Case 2: the sampling frequency is not sufficient to reproduce the variations of the signal.

sampling-freq-2

Case 3: the sampling frequency is sufficient but the number of samples is not sufficient to reproduce the entire signal (i.e. only part of the input signal is reproduced).

sampling-freq-3

IV. Using NanoEdge AI Studio

In order to generate the NanoEdge AI static library, the NanoEdge AI Studio walks the user through several steps:

  1. Creating a new project and setting up its parameters.
  2. Importing "regular signal" files into the studio.
  3. Importing "abnormal signal" files into the studio.
  4. Run the library selection process.
  5. Test the best library found by the Studio.
  6. Compile and download the library.

all_steps

1. Creating a new project

In the main window, you can:

  • Create a new project
  • Load an existing project

3_home_settings

To go back to the project creation screen at any time, click the "home" icon at top-left of the screen (home). Settings (Proxy, License, Port, Language, etc) can be accessed by clicking the gear icon (settings). All NanoEdge AI Studio documentation is available via the docs icon (settings).

To create a new project:

  • Enter your project's name and description
  • Choose your microcontroller type (ARM Cortex M0 to M7)
  • Choose the maximum amount of RAM that you wish to dedicate to AI / Machine Learning (value in kB)
  • Choose the sensor type used to collect data (with the correct number of axes).
  • Then click CREATE.

2_project_details

2. Importing signal files

In these two next steps ( step 2 and step 3), you will import two signal files.

  • The Regular signals file, corresponding to nominal machine behavior, i.e. data acquired by sensors during normal use, when everything is functioning as expected.

22_screen2_top

  • The Abnormal signals file, corresponding to abnormal machine behavior, i.e. data acquired by sensors during a phase of anomaly.

33_screen3_top

This abnormal signals file is necessary to give the benchmark algorithms some context, in order to select the best library possible. At this stage, no learning is taking place yet. In later stages, after the optimal library is selected, compiled, and downloaded, it will be completely fresh, brand new, untrained, and will therefore have no bias whatsoever towards any kind of anomaly. The learning process that will then be performed, either via the NanoEdge AI Emulator, or in your embedded hardware application, will be completely unsupervised.

i. Expected file format

Supported file formats are .txt and .csv. Recommended separators are single spaces, commas or semicolons. Please make sure that your input file is correctly formatted:

22_screen2_checks

In this example, we have an input file containing 200 examples of nominal data (200 lines), for a 3-axis accelerometer that uses a buffer size of 256 (which gives 256x3 = 768 numerical values per line).

Please make sure that you are using a sufficient number of lines (50 or more) and a constant number of values per line, and uniform separators throughout the file.

You will get an alert or an error if your input file is formatted incorrectly:

22_screen2_failcheck

Note: most checks will prevent you from proceeding further. The only non-blocking checks are the last three (check for constant samples, duplicates, and outliers), as they only indicate warnings; proceed with caution.

ii. Data plots

On the right hand side of the screen, you will see a preview of data contained in your input files:

  • For regular signals, in step 2:

22_screen2_graph

  • For abnormal signals, in step 3:

33_screen3_graph

These graphs show a summary of the data contained in each line of your input files. There are as many graphs as sensor axes.

Each value on the x-axis corresponds to the data of one column of your input file. The y-values contain an indication of the mean value of each column (across all signals, or lines), their min-max values, and their standard deviation.

Here, our accelerometer sampled 256 values per line (per axis), so we see 256 points on the graphs' x-axis. Those graphs do not represent a temporal evolution of the behavior of your machine as a whole, but rather a snapshot of the actual physical signals contained on each line, averaged across all lines.

If the values from one line to the other are really similar to one another, or are the same, the variance will be low, which means that adding more data won't improve performances much.

Several input files can be loaded (it will be shown on the left side of the screen, see below), either for "regular" or "abnormal" signals, but only one (for each) at a time will be used for library selection.

22_screen2_sigfiles

3. Running the library selection process

In this 4th step, you will start and monitor the library benchmark. NanoEdge AI Studio will search for the best possible library given the signal files provided in steps 2 and 3 (see previous section).

44_screen4_top_notstarted

i. Starting the benchmark

Click START to open the signal selection window:

55_select_signals

Here, you can select file couples (one nominal signal file + one abnormal signal file) that will be used to test the performance of all candidate libraries.

You can also add several such couples (several groups of 2 signals that will be used together) by clicking Add group of signals, as shown below.

55_group_signals

Then, select the number of microprocessor cores from your computer that you wish to dedicate to the benchmark process (see below). Selecting more CPU cores will parallelize the workload of our algorithms, and greatly speed up the process. Please use as many as you can, but be aware that using all available CPU cores might temporarily slow down your computer's performances.

55_cpu_and_validate

When you are ready to start the benchmark, click Validate.

ii. Library performance indicators

NanoEdge AI Studio uses 3 indicators to translate the performance and relevance of candidate libraries, in the following order of priority:

  • Balanced accuracy
  • Confidence
  • RAM

44_screen4_indicators

Balanced accuracy: This is the ability of the library to classify (i.e. correctly identify) regular signals as regular, and abnormal signals as abnormal. Optimising balanced accuracy is the first priority of our algorithms.

Confidence: this is the ability of the library to mathematically separate abnormal signals from regular ones. More precisely, this "confidence" is the functional margin: the mathematical distance between normal and abnormal signals, and to the decision boundary separating them (90% similarity threshold, see below). Increasing this functional margin is the algorithms second priority.

RAM: This is the maximum amount of memory space needed by the library after your integrate it on your microcontroller. The maximum amount of RAM used is optimised last.

Along with those 3 indicators, a graph shows a plot of all data points, against a percentage of similarity (on the y-axis). Similarity is a measure of the how much a given data point fits in with (how much it resembles) the existing knowledge base of the library.

Regular signals are shown as blue dots, and abnormal signals as red dots. The threshold (decision boundary between the two classes, "nominal" and "anomaly") set at 90% similarity, is shown as a grey dashed line.

44_screen4_plotsignalalone

To illustrate the concepts of balanced accuracy and confidence with the above plot:

  • 100% balanced accuracy would mean that all blue dots are above the 90% threshold, and all red points are below.
  • 100% confidence would mean that all blue dots are at 100% similarity, while all red dots are at 0% similarity.

iii. Benchmark progress and summary

As soon as the library selection process is initiated, a graph will be displayed on the right hand side of the screen (see below), showing the evolution of the 3 performance indicators (see above section) over time, as thousands of candidate library are tested.

44_progress_plot

The selection algorithms will first try to maximise balanced accuracy, then confidence, and finally to decrease the RAM needed as much as possible.

Important note:

The benchmark process may take a long time. Please be patient and let it run till completion. It is however possible to interrupt the process prematurely by clicking STOP (available by hovering your mouse on the progress bar that shown below). You will then be able to compile the best library found up to the moment of interruption.

You might want to do so when in a hurry, but be aware that unless the balanced accuracy / confidence are well above 90%, the selected library can't be expected to perform optimally and give results that are satisfactory, or even make sense.

44_time_elapsed

When the benchmark is complete, the above-described progress graph will be replaced by a summary, which includes a plot of the library's learning behavior.

44_screen4_plotiteration

This graph shows the number of learning iterations needed to obtain optimal performances from the library, when it is embedded in your final hardware application. In this particular example, NanoEdge AI Studio recommended that the learn() should be called 80 times, at the very minimum.

Note that realistically, as a general rule, the library learning phases should never include fewer than 20-50 learning iterations (see NanoEdge AI Library and/or NanoEdge AI Emulator documentation). The more, the better. However the way that those iterations are run (the precise learning conditions, the time period between them, the sampling frequency used...) are entirely up to the user, as they heavily depend on the project constrains, and the types of phenomena measured.

Several successive benchmarks can be run, and will all be saved. They can be loaded by clicking them on the left hand side of the screen.

44_screen4_benchmarks_left

5. Testing the NanoEdge AI Library

In this step (number 5), you have the opportunity to test the library that was selected during the benchmark process (step 4).

5_test_top

This step provides an interface to test the main NanoEdge AI Library functions via the NanoEdge AI Emulator. NanoEdge AI Emulator is a command-line tool that emulates the behavior of the associated library. Therefore, each library, among hundreds of thousands of possibilities, comes with its own emulator.

You can download the NanoEdge AI Emulator and its documentation via the links provided on the left side of the screen:

download_links

Select the benchmark corresponding to the library that you wish to test.

5_emulation_steps

You can run the initialize(), set_sensitivity(), learn() and detect() functions by clicking the associated buttons, and change the parameters manually if needed.

5_emulation_steps

Live and learn is the basic philosophy of NanoEdge AI Library. Normally, the robustness of NanoEdge AI Library's knowledge base increases with the number of reliable input signals. In order to get optimal performances, several rules should be followed:

  1. Input signals need to be uniformized (e.g. one shouldn't use temperature patterns for learning and humidity patterns for verification).
  2. "Learn" can be called at any time, if the input signal is reliable (nominal data).
  3. "Detect" can be called at any time, if a minimum knowledge base already exists.
  4. "Initialize" can be called at the beginning before all other processes, or when we want to delete the existing knowledge base and build a new model.

ild

To run all functions sequentially, click the green round button at the bottom.

The outputs will be displayed on the right hand side of the screen:

5_test_outputs

Note: when you start using the library, after the init step, it will be completely "fresh", i.e. its machine learning model will be completely untrained. The learning phase, corresponding to many iterations of the learn() function (use at least the minimum number of iterations recommended in step 4), will be completely unsupervised and incremental.

Feel free to use the "regular signals" input file imported in step 2, as a first training phase. You can then run detection, either using your regular signals, or the "abnormal signals" file used in step 3, to check how many anomalies are detected.

Of course, if you have more data files containing signals that you wish to test, now is the time to check that the library indeed works as intended.

Possible causes of poor results

If you find that the library is performing poorly, make sure that:

  • The datasets used for library selection (benchmark) are coherent with the ones you're using for testing. The regular / abnormal signals, imported in NanoEdge AI Studio in steps 2 and 3 respectively, should correspond to the same machine behaviors / regimes / physical phenomena as the ones used for testing. Namely, the anomalies you're trying to detect should be somewhat similar in nature to the ones used as "abnormal signals".
  • Your machine learning model is rich enough. Don't hesitate to run several learning phases, as long as they all use nominal data as input (only normal, expected behavior should be learned).
  • Your balanced accuracy and confidence scores are well above 90%.
  • You used a sufficient number or signals in both regular and abnormal signals files. Realistically, make sure that your input files contain more than 20-50 lines.
  • The sampling method used (on your sensors) is adequate for the physical phenomena studied, in terms of frequency, buffer size, duration, etc.

6. Downloading the NanoEdge AI Library

In this last step (number 6), the library will be compiled and downloaded, ready to be either tested via NanoEdge AI Emulator, or directly used in your embedded application.

55_screen5_top

Before compiling the library, several compilation flags are available, both checked by default.

55_screen5_dev_options

On the right hand side of the screen, a code snippet is shown. It aims at providing general guidelines about how your code could be structured, and how the NanoEdge AI Library functions should be used (for more info, see the NanoEdge AI Library and/or NanoEdge AI Emulator documentations).

If you ran the selection process several times, make sure that the correct benchmark is selected. Then, when you are ready to download the NanoEdge AI Library, click Compile.

compile

After carefully reading the license contract terms, click Accept and compile. After a short delay, a .zip file will be downloaded to your computer.

6_zip_file

It contains all relevant documentation, the NanoEdge AI Emulator (both win32 and unix versions), two NanoEdgeAI headers (C and C++), and a .json file containing some library details.

You can also re-download any previously compiled library, via the archived libraries list :

6_zip_file

Congratulations; you can now use your NanoEdge AI Library and Emulator!

V. Frequently asked questions

Why does my benchmark take so long?

The benchmark step can take a substantial amount of time, especially if your input files contain many signals (> 1000 lines) and/or long signals (>1024 samples per line). This is normal, please be patient. The selection algorithm uses both regular and abnormal datasets to test several machine learning models against your data, along with a multitude of different (hyper)parameters.

However, make sure that your input files are not too large (don’t try to use files containing gigabytes of data, or millions of signals, unless you have substantial computing resources available).

To speed up the process, use as many microprocessor cores as possible (right when you start the benchmark, select more cores by clicking “+”).

Note: benchmarks can be interrupted at any time, without losing any of the progress made. However, they can’t be resumed. If a benchmark is stopped prematurely, you can still move to the “compile” step, for which the best library found so far will be used (in this case, optimal results can’t be expected).

Why are my benchmark results so low (in terms of balanced accuracy and confidence)?

Make sure that your input files contain a sufficient number of signals (lines). We recommend using as many signals as possible, realistically not less than 50, ideally more than 500.

It is also possible that the data contained in your nominal and abnormal signal files look too “similar”, i.e. can’t be correctly attributed / separated by our algorithms within an acceptable threshold. If you get “balanced accuracy” and/or “confidence” values lower than 90%, the selected library (although it is the most relevant one found) might not perform as well as expected, when testing it in later stages.

Resources

All NanoEdge AI Studio documentation is available here.

Step-by-step tutorials, to use NanoEdge AI Studio to build a smart device from A to Z:

Useful links: