Live camera feed in SwiftUI with AVCaptureVideoPreview layer

9 minute read

In this post, we are going to build a SwiftUI-based app which shows the live camera feed on the screen with rotation support. The app is built with future image processing in mind, so that for example an object detection model can easily be added. Note that this is different from my DriverAssistant app, which uses SwiftUI embedded in a Storyboard-based interface.

This app uses UIKit and the AVFoundation framework for handling the camera. You can get the code for this post here.

If you want to learn more about views, controllers, and layers, and how to work with them you can check out this post or the video.

And here is a video version of the post.

Adding UIKit - Hosting ViewController

The first thing we do is to create a new SwiftUI app in Xcode and add a file called ViewController.swift to it, see figure 1. In ContentView we replace the HelloWorld text with

HostedViewController()
   .ignoresSafeArea()

This throws an error since we have not defined HostedViewController yet.

SwiftUI project with hosted controller.
Figure 1: SwiftUI project with a file ViewController.

To fix this, we go into the ViewController.swift file and add the three imports UIKit, SwiftUI, and AVFoundation. We need the AVFoundation framework to access the camera.

We create the UIViewController class ViewController which is going to contain all of the logic to present the camera feed on the screen.

import UIKit
import SwiftUI
import AVFoundation

class ViewController: UIViewController {

}

To add this controller to SwiftUI we create a UIViewControllerRepresentable which wraps it in a SwiftUI view. Let’s add the following code below the ViewController class.

struct HostedViewController: UIViewControllerRepresentable {
    func makeUIViewController(context: Context) -> UIViewController {
        return ViewController()
        }

        func updateUIViewController(_ uiViewController: UIViewController, context: Context) {
        }
}

When we now go back to the ContentView we see that the error has disappeared. The controller’s underlying view is shown on the screen as a SwiftUI view.

Setting up ViewController

The controller has two tasks here. It checks if the app has permission to access the camera, and if so sets up the capture session to present the feed.

The first thing we do in the controller is to define some variables.

private var permissionGranted = false // Flag for permission

private let captureSession = AVCaptureSession()
private let sessionQueue = DispatchQueue(label: "sessionQueue")

private var previewLayer = AVCaptureVideoPreviewLayer()
var screenRect: CGRect! = nil // For view dimensions

The first one lets us handle the control flow depending on whether the user has granted access to the camera or not.

The next two are required for accessing the camera, while the last two deal with presenting the camera feed. We will go over all of them in detail later.

We override the viewDidLoad method to check for permission and start the capture session once the app is opened.

import UIKit
import SwiftUI
import AVFoundation

class ViewController: UIViewController {
    private var permissionGranted = false // Flag for permission
    private let captureSession = AVCaptureSession()
    private let sessionQueue = DispatchQueue(label: "sessionQueue")
    private var previewLayer = AVCaptureVideoPreviewLayer()
    var screenRect: CGRect! = nil // For view dimensions
  
  override func viewDidLoad() {
        checkPermission()
        
        sessionQueue.async { [unowned self] in
            guard permissionGranted else { return }
            self.setupCaptureSession()
            self.captureSession.startRunning()
        }
    }
}

The functions we call here are implemented and explained in detail below. CheckPermission() is a function in which we check if the user has granted permission to use the camera. Then we use the sessionQueue to set up our capture session if the app has permission to access the camera.

Requesting camera access permission

Before our app can access the camera for the first time, the user has to grant permission to do so. To add this check we add a privacy user description, see figure 2. To get there, select the project, then go to Info and add a new entry by hovering over an existing entry and clicking on the plus symbol.

Changing camera access permission in menu info.
Figure 2: Adding camera usage description to allow the app to access the camera.

The user’s decision is persisted on the device. Whenever the app starts, we read this decision with AVCaptureDevice.authorizationStatus(for: .video) and set our flag permissionGranted accordingly.

func checkPermission() {
    switch AVCaptureDevice.authorizationStatus(for: .video) {
        // Permission has been granted before
        case .authorized:
            permissionGranted = true
                
        // Permission has not been requested yet
        case .notDetermined:
            requestPermission()
                    
        default:
            permissionGranted = false
    }
}

In case the user has not been asked to grant access we request permission.

func requestPermission() {
    sessionQueue.suspend()
    AVCaptureDevice.requestAccess(for: .video) { [unowned self] granted in
        self.permissionGranted = granted
        self.sessionQueue.resume()
    }
}

When we go back to the controller, we see that the capture session setup is performed on the DispatchQueue sessionQueue. The first thing we do in it is to check the value of our flag. Since requesting permission is asynchronous, we suspend the session queue here before making the request. This ensures that the capture session is configured only once a decision has been made.

Setting up capture session

Now that we have the permission handled, let’s implement the function setupCaptureSession().

A capture session lets us access devices such as the camera and provide captured data for other objects, such as the preview layer in our case. Figure 3 shows the components of the capture session which we will need here.

Capture session with camera as input and preview layer as output.
Figure 3: Capture session (centre) with camera as input and a preview layer as output.

We need one input, the camera, and one output, the previewLayer. In the capture session, we can rotate the incoming frames into the correct orientation.

In the code, we first add the camera. You might have to select a different camera in line 3, depending on your device.

func setupCaptureSession() {        
    // Access camera
    guard let videoDevice = AVCaptureDevice.default(.builtInDualWideCamera,for: .video, position: .back) else { return }
    guard let videoDeviceInput = try? AVCaptureDeviceInput(device: videoDevice) else { return }
        
    guard captureSession.canAddInput(videoDeviceInput) else { return }
    captureSession.addInput(videoDeviceInput)
                      
    // TODO: Add preview layer      
}

Adding live preview

Let’s implement the previewLayer as the output (right side in figure 3). AVFoundation gives us the AVCaptureVideoPreviewLayer class to present a camera feed. This is a CALayer which we can add to our view/layer hierarchy. To learn more about layers and how to position them check out this post.

To add the layer, we have to set its size and position. Since the dimension depends on the device’s orientation we save the current width and height in the screenRect variable, and use it to set the .frame property of the layer. Additionally, we set the orientation of the incoming frames to portrait.

screenRect = UIScreen.main.bounds  

previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer.frame = CGRect(x: 0, y: 0, width: screenRect.size.width, height: screenRect.size.height)
previewLayer.videoGravity = AVLayerVideoGravity.resizeAspectFill // Fill screen

previewLayer.connection?.videoOrientation = .portrait

All of this setup is handled on our sessionQueue. Since updates to the UI must be performed on the main queue we add our preview layer to the root view’s layer on the main queue.

 // Updates to UI must be on main queue
DispatchQueue.main.async { [weak self] in
    self!.view.layer.addSublayer(self!.previewLayer)
}

When we now run this app we can see a full screen camera feed, see figure 4.

Preview layer shows live feed in portrait orientation.
Figure 4: PreviewLayer presents camera feed.

Adding device rotation

The app works great as long as we open it in portrait orientation and leave it there. As soon as we rotate the device, SwiftUI will rotate the view. Then the dimensions of the preview layer don’t match the screen anymore and so we have a large unused section on the screen. Additionally, the camera feed now appears rotated, see figure 5.

Rotating the device causes preview feed to be sized incorrectly.
Figure 5: Rotation of device not handled.

We can fix this by reading the device’s orientation when it has changed and update the layer’s dimensions as well as the orientation of the incoming frames.

To detect that the orientation has changed we override the controller’s willTransition method.

override func willTransition(to newCollection: UITraitCollection, with coordinator: UIViewControllerTransitionCoordinator) {

}

When this method is triggered, we update our variable screenRect with the current screen dimensions. Then, we read the device’s orientation and update the rotation of the incoming frames in the capture connection.

screenRect = UIScreen.main.bounds
self.previewLayer.frame = CGRect(x: 0, y: 0, width: screenRect.size.width, height: screenRect.size.height)

switch UIDevice.current.orientation {
    // Home button on top
    case UIDeviceOrientation.portraitUpsideDown:
        self.previewLayer.connection?.videoOrientation = .portraitUpsideDown
             
    // Home button on right
    case UIDeviceOrientation.landscapeLeft:
        self.previewLayer.connection?.videoOrientation = .landscapeRight
            
    // Home button on left
    case UIDeviceOrientation.landscapeRight:
        self.previewLayer.connection?.videoOrientation = .landscapeLeft
             
    // Home button at bottom
    case UIDeviceOrientation.portrait:
        self.previewLayer.connection?.videoOrientation = .portrait
                
    default:
        break
}

Note that there is a difference between the orientation of the device and the layer in landscape modes. That is because the device checks if the home button is on the left or right, while the layer checks its top edge. So when the device is in landscapeLeft, the layer is in landscapeRight, see figure 6.

Landscape left means different things for device and layer due to reference points.
Figure 6: Landscape left and right are different for device and layer since they use the bottom and top respectively as reference.

When we run our app again and rotate the device we see that the preview feed now works correctly in all orientations, see figure 7.

Handling of device rotation sizes preview layer correctly in all orientations.
Figure 7: We update dimensions when the device's orientation changes.

Why not use SwiftIU only?

While it is possible to build what we just did without UIKit (see my video) it is not practical to add both a preview and a detection model in parallel.

Detecting objects is computationally expensive and depending on the size of the model, can take much longer than presenting frames as done in the preview. That is why we want to have the preview done on one thread and the detection in parallel on a separate one. That way we can skip frames for the detection if necessary while maintaining a smooth camera feed.

The reasons we must use UIKit here are this. When we use SwiftUI only we use a pixel buffer as output from the capture session to retrieve our frames instead of the preview layer. We could read them for both the preview and for detection, but then the slower detection task will cause the preview to appear laggy. Speeding up the detection by droping frames in the buffer would cause the same result. One solution could be to have two buffers, one from which we read all frames for the preview and a second one from which we drop frames if the detection is too slow to keep up with the frame rate of the preview. Unfortunately it is not possible to use two pixel buffers with the same capture session. Likewise, it is not possible to connect the same camera to two capture sessions.

So our only feasible option is to use the preview layer for the feed and the buffer for providing frames for the detector. Since the AVCaptureVideoPreviewLayer is a Core Animation layer, we must add it via a hosted view to SwiftUI.

Conclusion

We have implemented a basic app with SwiftUI and UIKit which reads frames from the camera and presents them in a live feed on the screen. The app also supports all orientations of the device.

We have seen why we must use UIKit if we want to provide frames for a computationally expensive task such as object detections models without impacting the live feed’s performance. We will see how to add a detection model in the next post.