Finding the models
I became interested in Apple’s on-device safety models after reading about NeuralHash, which is a perceptual hashing model that Apple uses to detect known CSAM (child sexual abuse material) images on users’ devices without uploading the actual images to the cloud. NeuralHash is part of a larger family of models that Apple has developed for various classification tasks. After successfully reverse engineering NeuralHash, I wanted to see what other safety models Apple devices have.
They come in the form of espresso models, which usually consist of three files: .espresso.net, .espresso.shape, and .espresso.weights. You can find these files by running find /System/Library/ -name "*espresso*". You can see through the naming convention that these models are used for several things such as image classification, object detection, text summarization, and more. One of the more interesting ones that I found is called SafetyNetLight, which is a model that classifies images into 10 safety categories. It’s used in Apple’s content moderation features to help identify potentially harmful content on users’ devices. I created a library called espresso2onnx that converts these espresso models to a more usable ONNX format.
Info → When doing the original research on NeuralHash, all of the espresso files were plain JSON. This is true for any of the older models like pet classifier, sound classifier, etc. The newer espresso models in `CoreSceneUnderstanding.framework` use a different format entirely: a custom `pbze` header followed by an LZFSE compressed payload.
Apple already has a couple of public API’s that use some of these models, looking through their documentation I found VNClassifyImageRequest, VNGenerateImageFeaturePrintRequest and SCSensitivityAnalyzer
VNGenerateImageFeaturePrintRequest is particularly interesting because it gives us access to the same 768-dimensional “sceneprint” embedding that SafetyNetLight consumes. By extracting the sceneprint via the public Vision API and comparing it against our ONNX model’s output, we can verify that our converted model is producing correct results. Similarly, VNClassifyImageRequest uses the same SceneNet backbone internally for its 1,374 scene categories, giving us another point of verfication.
Analysis of CoreSceneUnderstanding.framework networks
Searching the /System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/ directory, we find a number of espresso models and taxonomies:
ls -laR /System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/CoreSceneUnderstanding.framework/Resources/
├── scenenet_v5_model/
| ├── scenenet_sydro_model_default_config.json
│ └── SceneNet_v5.13.0_8wiqmpbbig_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.2/
│ ├── *.espresso.net
│ ├── *.espresso.shape
│ └── *.espresso.weights
├── scenenet_v5_custom_classifiers/
│ ├── SafetyNetLight/
│ │ └── SafetyNetLight_v1.1.0/
│ │ ├── *.espresso.net
│ │ ├── *.espresso.shape
│ │ └── *.espresso.weights
│ ├── EventsLeaf/
│ ├── JunkLeaf/
│ ├── JunkHierarchical/
│ ├── CityNature/
│ └── SemanticDevelopment/
└── taxonomies/
├── SafetyNetLight/
│ └── SafetyNetLight-v1a_vocabulary00__leaf.bplist
└── EntityNet/
└── ... (label mappings)We discover a taxonomy list stored as a binary plist (bplist) file for the SafetyNetLight model revealing the following classification categories:
plutil -p /System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/taxonomies/SafetyNetLight/SafetyNetLight-v1a_vocabulary00__leaf.bplist[
0 => "unsafe"
1 => "sexual"
2 => "violence"
3 => "gore"
4 => "weapon_violence"
5 => "weapon_any"
6 => "drugs"
7 => "medically_sensitive"
8 => "riot_looting"
9 => "terrorist_hate_groups"
]Now we can analyze look at the structure of the espresso files. Running xxd reveals the magic number 70 62 7a 65, a proprietary format called pbze.
xxd /System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/scenenet_v5_custom_classifiers/SafetyNetLight/SafetyNetLight_v1.1.0/SafetyNetLight_v1.1.0_vx6zphgfsp_15880_safetynet_quant.espresso.net | head -n 500000000: 7062 7a65 0000 0000 4000 0000 0000 0000 pbze....@.......
00000010: 0000 1c0b 0000 0000 0000 0524 6276 7832 ...........$bvx2
00000020: 0b1c 0000 4402 c017 0046 0140 f51b 2082 ....D....F.@.. .
00000030: 07e7 0210 bd00 0000 3f1c 9002 cfc0 ec88 ........?.......
00000040: aaaa 8a22 30a6 e9d8 2319 635e 2c12 40c9 ..."0...#.c^,.@.I couldn’t find any official Apple documentation on this, but 28 bytes into the header, we see bvx2 which appears to be a magic number for the LZFSE compression format. We can confirm this by crafting a dd command that skips the header and pipes the rest of the file into lzfse for decompression:
dd if=/System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/scenenet_v5_custom_classifiers/SafetyNetLight/SafetyNetLight_v1.1.0/SafetyNetLight_v1.1.0_vx6zphgfsp_15880_safetynet_quant.espresso.net bs=1 skip=28 2>/dev/null | lzfse -decode -o safetynet.espresso.net.json
cat safetynet.espresso.net.json | head -n 40{
"storage" : "SafetyNetLight_v1.1.0_vx6zphgfsp_15880_safetynet_quant.espresso.weights",
"analyses" : {
},
"properties" : {
"mldb_token" : "mldb-br4yn3dam9"
},
"format_version" : 200,
"metadata_in_weights" : [
],
"layers" : [
{
"pad_r" : 0,
"fused_relu" : 1,
"fused_tanh" : 0,
"debug_info" : "input.8",
"pad_fill_mode" : 0,
"pad_b" : 0,
"pad_l" : 0,
"top" : "42",
"K" : 768,
"blob_biases" : 1,
"quantization_lut_weights_blob" : 3,
"name" : "input.8",
"has_batch_norm" : 0,
"type" : "convolution",
"n_groups" : 1,
"pad_t" : 0,
"has_biases" : 1,
"C" : 1024,
"bottom" : "image_embed_normalize_out",
"weights" : {
},
"Nx" : 1,
"pad_mode" : 0,
"pad_value" : 0,
"Ny" : 1,We discover that the input is a convolution layer with 1024 input channels and 768 output channels. This suggests that SafetyNetLight is not a standalone model, but rather a custom classifier that is built on top of the SceneNetv5 model, which produces the image embeddings that are fed into this model. The weights and quantization lookup tables are stored in the weights file and the shape file contains the dimensions of the blobs. We can extract the shape file in the same way:
dd if=/System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/scenenet_v5_custom_classifiers/SafetyNetLight/SafetyNetLight_v1.1.0/SafetyNetLight_v1.1.0_vx6zphgfsp_15880_safetynet_quant.espresso.shape bs=1 skip=28 2>/dev/null | lzfse -decode -o safetynet.espresso.shape.json
cat safetynet.espresso.shape.json | head -n 20{
"layer_shapes" : {
"x.2" : {
"k" : 1024,
"w" : 1,
"n" : 1,
"_rank" : 4,
"h" : 1
},
"42" : {
"k" : 1024,
"w" : 1,
"n" : 1,
"_rank" : 4,
"h" : 1
},
"input.6" : {
"k" : 1024,
"w" : 1,
"n" : 1,Before extracting the details from the weights files, we can run the same process on the SceneNetv5 model to see what it looks like.
dd if=/System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/scenenet_v5_model/SceneNet_v5.13.0_8wiqmpbbig_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.2/SceneNet_v5.13.0_8wiqmpbbig_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.2.espresso.shape bs=1 skip=28 2>/dev/null | lzfse -decode -o scenenet.espresso.shape.json
dd if=/System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/scenenet_v5_model/SceneNet_v5.13.0_8wiqmpbbig_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.2/SceneNet_v5.13.0_8wiqmpbbig_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.2.espresso.net bs=1 skip=28 2>/dev/null | lzfse -decode -o scenenet.espresso.net.jsonSceneNetv5 has 11 output heads:
| Output | Shape | Description |
|---|---|---|
| inner/sceneprint | [1, 768, 1, 1] | Used by SafetyNetLight |
| classification/labels | [1, 1374, 1, 1] | Scene classification (1374 categories) |
| entitynet/labels | [1, 7287, 1, 1] | Entity recognition (7287 categories) |
| aesthetics/scores | [1, 2, 1, 1] | Aesthetic quality |
| aesthetics/attributes | [1, 21, 1, 1] | Aesthetic attributes (21 dimensions) |
| detection/scores | [1, 30, 90, 90] | Object detection |
| detection/coordinates | [1, 4, 90, 90] | Bounding boxes |
| fingerprint/embedding | [1, 4, 6, 6] | Image fingerprint (144-d) |
| saliency/map | [1, 1, 68, 68] | Saliency map |
| objectness/map | [1, 1, 68, 68] | Objectness map |
Looking at scenenet_sydro_model_default_config.json file we found from our previous ls command, we see how the full pipeline is put together.
cat scenenet_sydro_model_default_config.json | jq{
"data_input_key": "image",
"model_relative_path": "models/scenenet_v5_model/SceneNet_v5.11.1_47tazbjgzq_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.1/SceneNet_v5.11.1_47tazbjgzq_fe1.3_sc3.3_sa2.4_ae2.4_so2.4_od1.5_fp1.5_en0.1.espresso.net",
"model_input_shape": [
0,
3,
360,
360
],
"model_input_range": [
0,
255
],
"model_input_channel_order": "nchw",
"model_input_key": "image",
"model_outputs": [
{
"key": "classification/labels",
"taxonomy": "taxonomies/applenet-scenenet-v5-20220719.json",
"vocabulary": "leaf",
"data_output_key": "scenenet_classification_labels"
},
{
"key": "aesthetics/attributes",
"taxonomy": "taxonomies/aesthetics-v8e.json",
"vocabulary": "basic",
"data_output_key": "scenenet_aesthetics_attributes"
},
{
"key": "aesthetics/scores",
"taxonomy": "taxonomies/aesthetics-v8e.json",
"vocabulary": "global",
"data_output_key": "scenenet_aesthetics_scores"
},
{
"key": "saliency/map",
"overlay_result_on_key": "image",
"data_output_key": "scenenet_saliency_map"
},
{
"key": "objectness/map",
"overlay_result_on_key": "image",
"data_output_key": "scenenet_objectness_map"
},
{
"key": "inner/sceneprint",
"data_output_key": "scenenet_sceneprint"
},
{
"key": "detection/coordinates",
"data_output_key": "scenenet_detection_coordinates"
},
{
"key": "detection/scores",
"data_output_key": "scenenet_detection_scores"
},
{
"key": "entitynet/labels",
"taxonomy": "taxonomies/EntityNet/entitynet_labels-v0a.json",
"vocabulary": "basic",
"data_output_key": "entitynet_classification_labels"
}
]
}The image classification flow is as follows:
┌─────────────────────────────────────────────────────────────────┐
│ Image (360x360 RGB) │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ SceneNet v5 (778 layers, 21 MB weights) │ │
│ │ Input: raw pixels [1, 3, 360, 360] │ │
│ │ Output: 768-d "sceneprint" embedding │ │
│ └──────────────────┬───────────────────────────────┘ │
│ │ │
│ │ 768 floating point numbers │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ SafetyNetLight (15 layers, 1.9 MB weights) │ │
│ │ Input: 768-d sceneprint [1, 768, 1, 1] │ │
│ │ Output: 10 safety probabilities [1, 10, 1, 1] │ │
│ └──────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ 10 independent safety scores (each 0.0 to 1.0) │
└─────────────────────────────────────────────────────────────────┘Parsing the weights
We observed in the net files the following fields related to the weights across multiple layers, here’s an example from the first convolution layer:
"quantization_lut_weights_blob" : 3,
"quantization_ranges_blob" : 5,This tells us that the weights are stored in quantized format, with lookup tables (LUT) for dequantization. According to Apple’s documentation on quantization, this is a way to reduce the model size and improve inference speed by using lower precision (e.g., 8-bit integers) instead of full precision (e.g., 32-bit floats). The LUTs are used to map the quantized values back to their original floating-point values during inference, therefore:
- Weights blob (id=3): Contains uint8 indices (1 byte per weight)
- Ranges blob (id=5): Contains float32 range values for dequantization
I didn’t know how to convert the uint8 indices back to float32 weights, so I had to do some experimentation to figure out the correct dequantization formula. I found some documentation on specific Apple algorithms but didn’t find any specific information on how the LUT quantization works. I tried a few different formulas based on common quantization schemes, but only one of them produced results that made sense given the values in the ranges blob. With some Codex help, I found the correct one which ended up matching TensorFlowLite’s quantization.
To verify, I looked at the final classifier layer (512 -> 10), which had only 20 range values for 10 output channels:
fc_ranges = np.frombuffer(blobs[33], dtype=np.float32) # 20 values
fc_mins = fc_ranges[:10] # First half
fc_maxs = fc_ranges[10:] # Second half
# Output showed clearly symmetric pairs:
# Channel 0: min=-0.150, max=+0.149
# Channel 1: min=-0.269, max=+0.267
# Channel 4: min=-1.276, max=+1.266This confirmed the split layout. The final dequantization formula is:
def dequantize_lut_weights(indices_blob, ranges_blob, out_channels, in_channels, kh=1, kw=1):
"""
Apple's LUT quantization format:
- indices_blob: uint8 array (1 byte per weight, values 0-255)
- ranges_blob: float32 array, split as [all_mins | all_maxes]
Dequantization: weight = min + (index / 255) * (max - min)
"""
num_weights = out_channels * in_channels * kh * kw
indices = np.frombuffer(indices_blob[:num_weights], dtype=np.uint8)
ranges = np.frombuffer(ranges_blob, dtype=np.float32)
# Split layout
mins = ranges[:out_channels]
maxs = ranges[out_channels:]
# Reshape and dequantize
indices_2d = indices.reshape(out_channels, in_channels * kh * kw).astype(np.float32)
scales = (maxs - mins) / 255.0
weights = mins[:, None] + indices_2d * scales[:, None]
return weights.reshape(out_channels, in_channels, kh, kw)Converting to ONNX and running inference
After making some modifications to my espresso2onnx script, I was finally able to extract the ONNX model by running: python3 espresso2onnx.py /tmp/safetynet_model/ -o safetynet_model.onnx
Then I can do the same for the SceneNetv5 model: python3 espresso2onnx.py /tmp/scenenet_model/ -o scenenet_model.onnx
And finally, we can run inference on some sample images to see the safety scores:
import onnxruntime as ort
from PIL import Image
import numpy as np
img = Image.open("photo.jpg").convert("RGB").resize((360, 360))
x = np.array(img, dtype=np.float32) / 127.5 - 1.0 # Scale to [-1, 1]
x = x.transpose(2, 0, 1)[np.newaxis, ...] # [1, 3, 360, 360]
# backbone
backbone = ort.InferenceSession("scenenet_sceneprint.onnx")
sceneprint = backbone.run(None, {"image": x})[0]
# safety classifier
safety = ort.InferenceSession("safetynet_model.onnx")
scores = safety.run(None, {"image_embed_normalize_out": sceneprint})[0]
print(scores.flatten())
# Output: [0.01, 0.05, 0.02, 0.00, 0.03, 0.01, 0.00, 0.00, 0.00, 0.00]Converting the remaining classifier heads
There were 6 classifier heads found in the SceneNetv5 model, we can convert all of them to ONNX and run inference to see what they output on some sample images. This will give us a better understanding of what each head is doing and how they relate to each other. We can also compare the outputs of the different heads to see if there are any correlations or patterns that emerge.
Here are the classifier heads I found in scenenet_v5_custom_classifiers/:
| Head | Output Size | Purpose |
|---|---|---|
| SafetyNetLight | 10 | Safety categories |
| EventsLeaf | 62 | Event types |
| JunkLeaf | 12 | Junk detection (i.e. blurry photo, screenshot, etc.) |
| JunkHierarchical | 5 | Junk hierarchy |
| CityNature | 3 | City vs nature |
| SemanticDevelopment | 2 | Food vs landscape |
To convert them all:
CSU="/System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework/Versions/A/Resources/scenenet_v5_custom_classifiers"
for head in JunkLeaf EventsLeaf JunkHierarchical CityNature SemanticDevelopment; do
dir=$(ls -d "$CSU/$head"/*/ | head -1)
mkdir -p /tmp/head && ln -sf "$dir"*.espresso.* /tmp/head/
python3 espresso2onnx.py /tmp/head -o ${head}.onnx
rm -rf /tmp/head
doneAll 5 converted successfully. Each being a small model that takes the 768-d sceneprint as input.
Complete inference script
Here’s a full Python script that runs the backbone and all classifier heads:
import argparse
import os
import sys
import subprocess
import numpy as np
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
SCENENET_MODEL = os.path.join(SCRIPT_DIR, "scenenet_classifier.onnx")
SAFETY_MODEL = os.path.join(SCRIPT_DIR, "safetynet_model.onnx")
SAFETY_CATEGORIES = [
"unsafe", "sexual", "violence", "gore", "weapon_violence",
"weapon_any", "drugs", "medically_sensitive", "riot_looting",
"terrorist_hate_groups"
]
TAXONOMY_BASE = (
"/System/Library/PrivateFrameworks/CoreSceneUnderstanding.framework"
"/Versions/A/Resources/taxonomies"
)
def load_bplist(path):
if not os.path.exists(path):
return None
try:
import plistlib
result = subprocess.run(
["plutil", "-convert", "xml1", "-o", "-", path],
capture_output=True, check=True)
return plistlib.loads(result.stdout)
except Exception:
return None
def load_taxonomy_labels():
labels = {
"scene_vocab": None,
"entity_vocab": None,
"human_readable": None,
}
en_base = f"{TAXONOMY_BASE}/EntityNet/v0b"
labels["human_readable"] = load_bplist(f"{en_base}/ENv0b_attribute__humanReadableLabel.bplist")
labels["scene_vocab"] = load_bplist(f"{en_base}/ENv0b_vocabulary00__scenenet_leaf.bplist")
labels["entity_vocab"] = load_bplist(f"{en_base}/ENv0b_vocabulary02__entitynet.bplist")
return labels
def get_label(labels, vocab_key, index):
vocab = labels.get(vocab_key)
hr = labels.get("human_readable")
if vocab and index < len(vocab):
ident = vocab[index]
if hr and ident in hr:
return hr[ident]
if ident:
return ident
return f"[{index}]"
def preprocess_image(image_path):
from PIL import Image
img = Image.open(image_path).convert("RGB").resize((360, 360))
x = np.array(img, dtype=np.float32) / 127.5 - 1.0
x = x.transpose(2, 0, 1)[np.newaxis, ...]
return x
def main():
parser = argparse.ArgumentParser()
parser.add_argument("image", help="Path to input image")
parser.add_argument("--only-human", action="store_true", help="Skip entities without human-readable labels")
args = parser.parse_args()
image_path = args.image
if not os.path.exists(image_path):
print(f"Error: Image not found: {image_path}")
sys.exit(1)
if not os.path.exists(SCENENET_MODEL):
print(f"Error: Model not found: {SCENENET_MODEL}")
sys.exit(1)
if not os.path.exists(SAFETY_MODEL):
print(f"Error: Safety model not found: {SAFETY_MODEL}")
sys.exit(1)
import onnxruntime as ort
print(f"Loading SceneNet v5 model...")
sess = ort.InferenceSession(SCENENET_MODEL)
labels = load_taxonomy_labels()
has_labels = labels["human_readable"] is not None
x = preprocess_image(image_path)
print(f"Image: {image_path}")
print("Running inference...")
outputs = sess.run(None, {"image": x})
output_map = {sess.get_outputs()[i].name: outputs[i] for i in range(len(outputs))}
top_n = 25
if "classification/labels" in output_map:
scores = output_map["classification/labels"].flatten()
top_idx = np.argsort(scores)[::-1][:top_n]
print(f"\n{'═' * 55}")
print(f"SCENE CLASSIFICATION (top {top_n} of {len(scores)})")
print(f"{'═' * 55}")
for idx in top_idx:
label = get_label(labels, "scene_vocab", idx) if has_labels else f"[{idx}]"
bar = "█" * max(1, int(scores[idx] * 40))
print(f" {scores[idx]:7.4f} {bar} {label}")
if "entitynet/labels" in output_map:
scores = output_map["entitynet/labels"].flatten()
print(f"\n{'═' * 55}")
print(f"ENTITY RECOGNITION (top {top_n} of {len(scores)})")
print(f"{'═' * 55}")
count = 0
for idx in np.argsort(scores)[::-1]:
label = get_label(labels, "entity_vocab", idx) if has_labels else f"[{idx}]"
if args.only_human and label.startswith("["):
continue
bar = "█" * max(1, int(scores[idx] * 40))
print(f" {scores[idx]:7.4f} {bar} {label}")
count += 1
if count >= top_n:
break
if "inner/sceneprint" in output_map:
sp = output_map["inner/sceneprint"]
sp_flat = sp.flatten()
print(f"\n{'═' * 55}")
print(f"SCENEPRINT: 768-d, range=[{sp_flat.min():.4f}, {sp_flat.max():.4f}]")
safety_sess = ort.InferenceSession(SAFETY_MODEL)
safety_out = safety_sess.run(None, {"image_embed_normalize_out": sp.reshape(1, 768, 1, 1)})
for i, o in enumerate(safety_sess.get_outputs()):
if "post_act" in o.name:
probs = safety_out[i].flatten()
print(f"\n{'═' * 55}")
print(f"SAFETY CLASSIFICATION")
print(f"{'═' * 55}")
for cat, prob in zip(SAFETY_CATEGORIES, probs):
print(f" {prob:7.4f} {cat}")
break
if __name__ == "__main__":
main()There are thousands of entity recognition categories, many of them them not having human-readable labels, so I added a --only-human flag to filter these out.
python run_scenenet_classifier.py /tmp/sf-chinatown.jpeg --only-humanLoading SceneNet v5 model...
Image: /tmp/sf-chinatown.jpeg
Running inference...
═══════════════════════════════════════════════════════
SCENE CLASSIFICATION (top 25 of 1374)
═══════════════════════════════════════════════════════
0.0300 █ Raw Metal
0.0209 █ Circuit Board
0.0180 █ Screenshot
0.0083 █ Textile
0.0077 █ Puzzles
0.0065 █ Gears
0.0060 █ Adult
0.0052 █ Raw Glass
0.0047 █ Foliage
0.0045 █ [904]
0.0042 █ Fence
0.0040 █ Spiderweb
0.0038 █ Wood Processed
0.0037 █ Polka Dots
0.0037 █ Jigsaw
0.0035 █ Map
0.0033 █ Cactus
0.0027 █ Diagram
0.0023 █ [1168]
0.0023 █ Handwriting
0.0022 █ Branch
0.0018 █ Window
0.0018 █ Spider
0.0018 █ Illustrations
0.0017 █ [1355]
═══════════════════════════════════════════════════════
ENTITY RECOGNITION (top 25 of 7287)
═══════════════════════════════════════════════════════
0.7122 ████████████████████████████ Pattern
0.7117 ████████████████████████████ Green
0.6554 ██████████████████████████ Turquoise
0.6193 ████████████████████████ Red
0.6068 ████████████████████████ Pink
0.5983 ███████████████████████ Yellow
0.5917 ███████████████████████ Text
0.5350 █████████████████████ Font
0.5038 ████████████████████ Purple
0.4920 ███████████████████ Fractal Art
0.4595 ██████████████████ Violet
0.4479 █████████████████ Monochrome
0.4137 ████████████████ Line
0.4115 ████████████████ Drawing
0.4043 ████████████████ Close-Up
0.4014 ████████████████ Lighting
0.3920 ███████████████ Reef
0.3885 ███████████████ Lavender
0.3857 ███████████████ Black and White
0.3758 ███████████████ Tartan
0.3748 ██████████████ Symmetry
0.3746 ██████████████ Hair
0.3741 ██████████████ Spine
0.3740 ██████████████ Face
0.3667 ██████████████ Cartoon
═══════════════════════════════════════════════════════
SCENEPRINT: 768-d, range=[-0.8027, 0.3495]
═══════════════════════════════════════════════════════
SAFETY CLASSIFICATION
═══════════════════════════════════════════════════════
0.0037 unsafe
0.0001 sexual
0.0001 violence
0.0002 gore
0.0000 weapon_violence
0.0016 weapon_any
0.0011 drugs
0.0000 medically_sensitive
0.0000 riot_looting
0.0000 terrorist_hate_groupsThe next post will be an even deeper dive into the Apple SummarizationKit models, which are also found in CoreSceneUnderstanding.framework.