tflm_label_image_ext_mem#
Overview#
This example demonstrates a TensorFlow Lite Micro-based image classifier that uses the external memory technique to handle large neural network models. It showcases how to fetch model weights from external xSPI flash memory to internal SRAM for execution on the Neutron NPU.
Key Features#
Model Architecture: Mobilenet V1 1.0 224 quantized convolutional neural network (4.3MB model size)
External Memory Support: Model stored in xSPI Flash and weights fetched to internal SRAM for NPU execution
NPU Acceleration: Utilizes Neutron NPU with
fetch_constants_to_sramparameterInput: 3-channel color image (224×224 pixels)
Output: Classification into 1000 classes
Usage Modes#
The example implements a complete image classification pipeline:
Static Image Processing: Processes a built-in static image (Grace Hopper) regardless of camera connection
Camera Processing (optional): When camera and display are connected, performs real-time image classification
Semihosting implementation causes slower or discontinuous video experience. Select UART in ‘Project Options’ during project import for using external debug console via UART (virtual COM port).
A 3-channel color image is input to the quantized MobileNet convolutional neural network model, which classifies the image into one of 1000 output categories.
External Memory Technique#
The key innovation in this example is the fetch-weights-to-SRAM mechanism:
Model Storage: The complete model (~4.3 MB) is stored in external xSPI flash memory
Weight Fetching: During inference, model weights are fetched from external flash to internal SRAM (scratch buffer)
NPU Execution: The Neutron NPU executes operations using weights cached in internal SRAM
This technique enables running large models that exceed the available internal SRAM capacity.
Model Conversion#
The converted TensorFlow Lite Micro model mobilenet_v1_1.0_224_int8_npu.tflite is generated by the Neutron Converter tool.
You can obtain the Neutron Converter tool from the eIQ Neutron SDK.
After downloading and extracting the Neutron SDK Zip package, you can find the Neutron Converter tool at the following path:
/eiq-neutron-sdk-linux-x.x.x/bin/neutron-converter.
The following are the commands required for model conversion, and please note:
Please make sure to add the
fetch_constants_to_sram trueparameter.In this example, the program loads model data from the
mobilenet_v1_1.0_224_int8_npu.tflitefile, rather than using the model array from themodel_data.hfile.
# Set environment variables
export NEUTRON_SDK_PATH="/path/to/eiq-neutron-sdk-linux-x.x.x"
export LD_LIBRARY_PATH="${NEUTRON_SDK_PATH}/lib:${LD_LIBRARY_PATH}"
export PATH="${NEUTRON_SDK_PATH}/bin:${PATH}"
./neutron-converter --input mobilenet_v1_1.0_224_int8.tflite \
--output mobilenet_v1_1.0_224_int8_npu.tflite \
--target imxrt700 \
--dump-header-file-output \
--fetch_constants_to_sram true
The minimum size of SRAM scratch memory will be displayed in the Neutron Converter terminal. Please note the value: SRAM scratch = 1,082,368 (bytes)
Memory:
Total data = 1,355,760 (bytes) (Inputs + Outputs + Intermediate Variable Tensors)
Total weights = 4,349,168 (bytes) (Weights)
Total size = 5,704,928 (bytes) (All)
SRAM scratch = 1,082,368 (bytes) (SRAM scratch for fetching weights from FLASH)
Running the Demo#
Run result on MIMXRT700-EVK board with ARM GCC toolchain:
Label image example using a TensorFlow Lite Micro model.
Detection threshold: 23%
Model: mobilenet_v1_1.0_224_int8_npu
Core/NPU Frequency: 324 MHz
TensorArena Addr: 0x20102050 - 0x20282050
TensorArena Size: Total 0x180000 (1572864 B); Used 0x14b234 (1356340 B)
Model Addr: 0x28200000 - 0x28626100
Model Size: 0x426100 (4350208 B)
Total Size Used: 5706548 B (Model (4350208 B) + TensorArena (1356340 B))
Static data processing:
----------------------------------------
Inference time: 51982 us
Detected: military uniform (88%)
----------------------------------------
Camera data processing:
Camera input is currently not supported on this device
How to Adapt for Larger Models#
The internal SRAM memory for scratching weights is scratchWeightsBuffer[SCRATCH_WEIGHTS_SRAM_SIZE].
( In the file
mcuxsdk/middleware/eiq/tensorflow-lite/tensorflow/lite/micro/kernels/neutron/neutron.cpp)
It starts from the base address of Non-cacheable memory (0x20400000 in this example). SCRATCH_WEIGHTS_SRAM_SIZE is 1200 KB in this example.
Linker File Configuration#
In the linker file, the m_data and m_ncache sections are configured as below:
m_data (RW) : ORIGIN = 0x20100000, LENGTH = 0x00300000
m_ncache (RW) : ORIGIN = 0x20400000, LENGTH = 0x00140000
Steps for Supporting Larger Models#
When running a larger model:
Check the minimum size of SRAM scratch memory reported during model conversion.
Ensure
SCRATCH_WEIGHTS_SRAM_SIZE> minimum required scratch memory size.Ensure the total length of
m_ncachememory in the linker script is large enough to accommodateSCRATCH_WEIGHTS_SRAM_SIZE.
Example: Adding 1 MB of SRAM Scratch Memory#
If an additional 1 MB is needed for SRAM scratch memory:
Adjust linker sections — Update
m_dataandm_ncachesectionsORIGINandLENGTHas needed:m_text (RX) : ORIGIN = 0x28004300, LENGTH = 0x002FBD00 m_data (RW) : ORIGIN = 0x20000000, LENGTH = 0x00300000 m_ncache (RW) : ORIGIN = 0x20300000, LENGTH = 0x00240000 m_model (RW) : ORIGIN = 0x28300000, LENGTH = 0x00A00000
Increase
SCRATCH_WEIGHTS_SRAM_SIZEto 2224 KB:static const unsigned int SCRATCH_WEIGHTS_SRAM_SIZE = 2224 * 1024;
Adjust tensor arena size as needed:
const int kTensorArenaSize = 2048 * 1024;