Introduction

Building production-ready computer vision features in iOS requires understanding both the CoreML framework and the Vision framework. In this comprehensive guide, we'll walk through creating a real-time object detection pipeline that can process camera feeds at 60fps.

Why CoreML + Vision?

Apple's CoreML framework provides hardware-accelerated machine learning inference, while Vision offers high-level APIs for common computer vision tasks. Together, they create a powerful combination for building sophisticated visual recognition features.

Key Benefits

Hardware Acceleration: Automatic optimization for Neural Engine, GPU, and CPU
Privacy-First: All processing happens on-device
Battery Efficient: Optimized power consumption compared to custom implementations
Easy Integration: Swift-native APIs that work seamlessly with SwiftUI

Setting Up Your Vision Pipeline

First, let's create a basic camera capture session that feeds frames to our Vision pipeline:

import AVFoundation
import Vision
import CoreML

class VisionPipeline: NSObject, ObservableObject {
    @Published var detectedObjects: [VNRecognizedObjectObservation] = []
    
    private let captureSession = AVCaptureSession()
    private let videoOutput = AVCaptureVideoDataOutput()
    private lazy var visionModel: VNCoreMLModel? = {
        guard let model = try? YOLOv3(configuration: MLModelConfiguration()) else {
            return nil
        }
        return try? VNCoreMLModel(for: model.model)
    }()
    
    func startCapture() {
        guard let camera = AVCaptureDevice.default(.builtInWideAngleCamera, 
                                                   for: .video, 
                                                   position: .back) else {
            return
        }
        
        do {
            let input = try AVCaptureDeviceInput(device: camera)
            captureSession.addInput(input)
            captureSession.addOutput(videoOutput)
            
            videoOutput.setSampleBufferDelegate(self, 
                                               queue: DispatchQueue(label: "vision.queue"))
            
            captureSession.startRunning()
        } catch {
            print("Camera setup failed: \(error)")
        }
    }
}

Processing Frames with Vision

The key to real-time performance is efficient frame processing. Here's how to handle incoming video frames:

extension VisionPipeline: AVCaptureVideoDataOutputSampleBufferDelegate {
    func captureOutput(_ output: AVCaptureOutput, 
                      didOutput sampleBuffer: CMSampleBuffer, 
                      from connection: AVCaptureConnection) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer),
              let model = visionModel else {
            return
        }
        
        let request = VNCoreMLRequest(model: model) { [weak self] request, error in
            guard let results = request.results as? [VNRecognizedObjectObservation] else {
                return
            }
            
            DispatchQueue.main.async {
                self?.detectedObjects = results.filter { $0.confidence > 0.7 }
            }
        }
        
        request.imageCropAndScaleOption = .scaleFill
        
        let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, 
                                           orientation: .up, 
                                           options: [:])
        try? handler.perform([request])
    }
}

SwiftUI Integration

Now let's create a SwiftUI view that displays the camera feed with bounding boxes:

struct VisionCameraView: View {
    @StateObject private var pipeline = VisionPipeline()
    
    var body: some View {
        ZStack {
            CameraPreviewView(session: pipeline.captureSession)
            
            GeometryReader { geometry in
                ForEach(pipeline.detectedObjects, id: \.uuid) { observation in
                    BoundingBoxView(observation: observation, 
                                   geometry: geometry)
                }
            }
        }
        .onAppear {
            pipeline.startCapture()
        }
    }
}

Conclusion

Building a CoreML Vision pipeline requires careful attention to performance and threading, but the results are worth it. With hardware acceleration and Apple's optimized frameworks, you can build sophisticated computer vision features that run smoothly on any iOS device.

Happy coding! 🚀

Building a CoreML Vision Pipeline in SwiftUI