MCUXpresso SDK Documentation

tflm_label_image_ext_mem#

Overview#

This example demonstrates a TensorFlow Lite Micro-based image classifier that uses the external memory technique to handle large neural network models. It showcases how to fetch model weights from external xSPI flash memory to internal SRAM for execution on the Neutron NPU.

Key Features#

  • Model Architecture: Mobilenet V1 1.0 224 quantized convolutional neural network (4.3MB model size)

  • External Memory Support: Model stored in xSPI Flash and weights fetched to internal SRAM for NPU execution

  • NPU Acceleration: Utilizes Neutron NPU with fetch_constants_to_sram parameter

  • Input: 3-channel color image (224×224 pixels)

  • Output: Classification into 1000 classes

Usage Modes#

The example implements a complete image classification pipeline:

  1. Static Image Processing: Processes a built-in static image (Grace Hopper) regardless of camera connection

  2. Camera Processing (optional): When camera and display are connected, performs real-time image classification

    Semihosting implementation causes slower or discontinuous video experience. Select UART in ‘Project Options’ during project import for using external debug console via UART (virtual COM port).

A 3-channel color image is input to the quantized MobileNet convolutional neural network model, which classifies the image into one of 1000 output categories.

External Memory Technique#

The key innovation in this example is the fetch-weights-to-SRAM mechanism:

  • Model Storage: The complete model (~4.3 MB) is stored in external xSPI flash memory

  • Weight Fetching: During inference, model weights are fetched from external flash to internal SRAM (scratch buffer)

  • NPU Execution: The Neutron NPU executes operations using weights cached in internal SRAM

This technique enables running large models that exceed the available internal SRAM capacity.

Model Conversion#

The converted TensorFlow Lite Micro model mobilenet_v1_1.0_224_int8_npu.tflite is generated by the Neutron Converter tool.

  • You can obtain the Neutron Converter tool from the eIQ Neutron SDK.

  • After downloading and extracting the Neutron SDK Zip package, you can find the Neutron Converter tool at the following path: /eiq-neutron-sdk-linux-x.x.x/bin/neutron-converter.

The following are the commands required for model conversion, and please note:

  • Please make sure to add the fetch_constants_to_sram true parameter.

  • In this example, the program loads model data from the mobilenet_v1_1.0_224_int8_npu.tflite file, rather than using the model array from the model_data.h file.

# Set environment variables
export NEUTRON_SDK_PATH="/path/to/eiq-neutron-sdk-linux-x.x.x"
export LD_LIBRARY_PATH="${NEUTRON_SDK_PATH}/lib:${LD_LIBRARY_PATH}"
export PATH="${NEUTRON_SDK_PATH}/bin:${PATH}"

./neutron-converter --input mobilenet_v1_1.0_224_int8.tflite \
                    --output mobilenet_v1_1.0_224_int8_npu.tflite \
                    --target imxrt700 \
                    --dump-header-file-output \
                    --fetch_constants_to_sram true

The minimum size of SRAM scratch memory will be displayed in the Neutron Converter terminal. Please note the value: SRAM scratch = 1,082,368 (bytes)

Memory:
  Total data    = 1,355,760 (bytes) (Inputs + Outputs + Intermediate Variable Tensors)
  Total weights = 4,349,168 (bytes) (Weights)
  Total size    = 5,704,928 (bytes) (All)
  SRAM scratch  = 1,082,368 (bytes) (SRAM scratch for fetching weights from FLASH)

Running the Demo#

Run result on MIMXRT700-EVK board with ARM GCC toolchain:

Label image example using a TensorFlow Lite Micro model.
Detection threshold: 23%
Model: mobilenet_v1_1.0_224_int8_npu
Core/NPU Frequency: 324 MHz
TensorArena Addr: 0x20102050 - 0x20282050
TensorArena Size: Total 0x180000 (1572864 B); Used 0x14b234 (1356340 B)
Model Addr: 0x28200000 - 0x28626100
Model Size: 0x426100 (4350208 B)
Total Size Used: 5706548 B (Model (4350208 B) + TensorArena (1356340 B))

Static data processing:
----------------------------------------
     Inference time: 51982 us
     Detected: military uniform (88%)
----------------------------------------

Camera data processing:
Camera input is currently not supported on this device

How to Adapt for Larger Models#

The internal SRAM memory for scratching weights is scratchWeightsBuffer[SCRATCH_WEIGHTS_SRAM_SIZE].

( In the file mcuxsdk/middleware/eiq/tensorflow-lite/tensorflow/lite/micro/kernels/neutron/neutron.cpp )

It starts from the base address of Non-cacheable memory (0x20400000 in this example). SCRATCH_WEIGHTS_SRAM_SIZE is 1200 KB in this example.

Linker File Configuration#

In the linker file, the m_data and m_ncache sections are configured as below:

m_data      (RW)  : ORIGIN = 0x20100000, LENGTH = 0x00300000
m_ncache    (RW)  : ORIGIN = 0x20400000, LENGTH = 0x00140000

Steps for Supporting Larger Models#

When running a larger model:

  1. Check the minimum size of SRAM scratch memory reported during model conversion.

  2. Ensure SCRATCH_WEIGHTS_SRAM_SIZE > minimum required scratch memory size.

  3. Ensure the total length of m_ncache memory in the linker script is large enough to accommodate SCRATCH_WEIGHTS_SRAM_SIZE.

Example: Adding 1 MB of SRAM Scratch Memory#

If an additional 1 MB is needed for SRAM scratch memory:

  1. Adjust linker sections — Update m_data and m_ncache sections ORIGIN and LENGTH as needed:

    m_text      (RX)  : ORIGIN = 0x28004300, LENGTH = 0x002FBD00
    m_data      (RW)  : ORIGIN = 0x20000000, LENGTH = 0x00300000
    m_ncache    (RW)  : ORIGIN = 0x20300000, LENGTH = 0x00240000
    m_model     (RW)  : ORIGIN = 0x28300000, LENGTH = 0x00A00000
    
  2. Increase SCRATCH_WEIGHTS_SRAM_SIZE to 2224 KB:

    static const unsigned int SCRATCH_WEIGHTS_SRAM_SIZE = 2224 * 1024;
    
  3. Adjust tensor arena size as needed:

    const int kTensorArenaSize = 2048 * 1024;
    

Supported Boards#