Stan Zaslavsky

Guide to 3D rendered photomontages of proposed developments and how they are being used as assessment tools

There is a bit of mystery around how the 3D renders of developments are turned into accurate streetscape photo-montages, or images that have 3D content super-imposed on them that are an accurate representation of what the streetscape will actually look like when the development is completed.

In this article, I wanted to clear up some of that mystery and discuss that their primary use is more about them being a “before” and “after” scenarios, based on which VCAT members make their assessments.

Previously, we’ve already discussed the important timeframes and key inputs we need to obtain when creating 3D photo-montages – the focus in this article are the underlying assumptions that are being made to enable the 3D-produced images to be used as effective assessment tools. (Have a look at this article for more information on timeframes and other key elements to keep in mind).

Visual Interpretation of 3D Photomontages

As a high-level overview of the process, what is asked of the 3D photo-montage renders is to accurately simulate physical reality, so that a person walking along the street or viewing the project would see the development and the kind of visual impact would be felt from viewing this proposed development.

However, the main challenge around digital images is that they will typically have one point of focus and will be developed using a single camera field of view. The reality? We have two eyes –  two “cameras” if you like – and we see objects in what is called “stereoscopic” view. With the eyes side by side, each eye takes a view of the same area from a slightly different angle. When the two images arrive simultaneously in the back of the brain, they are united into one picture.

The mind combines the two images by matching up the similarities and adding in the small differences. Those small differences between the two images can add up to a big difference in the final picture. The combined image is more than the sum of its parts. It is now a three-dimensional “stereo” picture. The advantage of a stereo picture is that we can see where objects are in relationship to us with much greater precision – so the depth dimension is much better perceived and understood with stereoscopic vision rather than monoscopic or one camera view.

Perhaps in the future, as technology and Virtual Reality (VR) devices evolve, there may be a case for presenting “real” built scenario evaluations if the proposed building can be placed into the streetscape scenes and stakeholders utilise a VR headset to see the developments in stereo, but, for now, the rendered 3D images are only being asked to show a “before” and “after” scenario, as 3D photo-montages are currently presented to the stakeholders either as printed or digitally-delivered flat images.

Photography and Choice of Camera Lens

Initially, collected photography of the “before” scenario then forms a crucial element to the 3D montages. Ideally, as the images are being used to evaluate an overall streetscape context, a wider lens is typically used to capture the streets.

Thus “barrel distortion” of the lens becomes a factor to consider. If the lens that is used is too wide (ie 14-18mm), straight lines towards the outer edges of the frame will start to visibly curve inwards. So, in simple terms, the wider the lens, the more distortion plays a factor on the outer edges of the photo – distorting straight lines and affecting our “depth” perception of how far away objects are from the camera.

The commonly accepted fact is that the human eye has a focal length of between 20-22 mm. What this means is that, in appropriate circumstances the photo camera should simulate this to provide us, the viewers, the widest content of the street, without too much distortion that may affects vertical lines and changes our perception of the impact of the subject site on the streetscape.

The subject site, however, is typically the main focus and positioned at the centre of the lens. The curvature of the lens at the centre is flattest – with negligible distortion to the straight lines of the proposed building, which helps us understand the overall perception of the development. It’s the objects on the outer edges of the photo that start to get distorted if the camera lens used gets further and further below 18mm in order to get more streetscape into view.

Another key to photography to consider is whether your camera is “full frame” or if the photo going to be cropped by the camera sensor. If you are using a crop frame camera, the sensor is cropping out the edges of the frame, which is effectively increasing the focal length. The amount of difference in the field of view or focal length with a crop sensor is measured by its “Multiplier.” For example, a Nikon APS-C crop sensor has a 1.5x multiplier.

A “full-frame” camera merely means that it’s more equivalent to a 35mm sensor when the cameras were using film, rather than digital. However, the more important factor to understand is that the full-frame camera is probably more representative of the human eye vision because there is less cropping if the lens that is used is between 20-22mm focal length. The diagram below shows the comparison between what is seen from a full-frame camera versus an APS-C crop lens.

As an example of a photo with a crop frame camera that is using a 20mm focal length is that, after the Multiplier effect is applied to a Nikon APS-C crop sensor, the photo would be equivalent as if taken by a 30mm focal length lens (20mm x 1.5 = 30mm). The real impact of this is that the photo could be creating more of a zoomed-in view of the streetscape, with objects closer to the viewer than normally would be perceived by a human eye.

Camera Height and Streetscape Composition

If the 3D photo-montages are then to be used as analysis tools – we need to have some basic identifiers to taking photos that would be “typically” accepted as normal. To do this – a commonly accepted method is to evaluate height and distance and other qualities of objects based on something called “subjective constancy”.

Subjective or perceptual constancy is about understanding that people have a certain perception of an object based on commonly accepted averages. As humans, we tend to relate certain perceptions in our mind that allow us to judge quality of objects visually. For example, size, shape, colour, distance, location and other qualities of objects – in our minds, we have certain assumed perceptions that then allow us to understand the overall quality of the scene. Consequently, when we see people in the street, we have a certain perception in our mind about their height and relative heights of other objects.

In Australia, according to Wikipedia, the average male height is 175.6 cm, whereas an average female height is 161.8cm and as the ratio is roughly 1.09 males to 1 female – we can calculate an average height of a typical person in the street to be around the 1.7m mark. By understanding that, we can then appreciate that the recommended camera point height for VCAT photography is ideally between 1650 to 1700mm off the ground, where a “typical” person would be viewing the streetscape context. The practice note from VCAT has an allowance to vary this height, provided there are very good grounds for it and they are explained in the “Statement of Methodology” report.

Similarly, for the streetscape composition in the “before” photos, if we can see cars or people walking along the street in the photos, the overall building scale can then be estimated based on our assumed perception of the heights for typical real objects.

Accuracy in Survey Driven 3D Photomontages

To accurately deliver the “after” view of the proposed development, physical reality can be simulated in digital 3D space to use for matching the virtual space to the photo. There are certain constants about the space, like AHD survey levels of neighbouring properties, immediate terrain details and the like. These constants can be used and modelled in 3D space to match the development to the photo.

As a major prerequisite to the practice note PNVCAT2 that governs the delivery of photo-montages, a survey of the camera points is an absolute essential. The reason for this is that, knowing the position of the camera, the camera can then be created in digital space with the same settings and positioned accurately related to datum objects in the survey – information that governs the location of the 3D digital model of the development.

Surveyed markers from the streetscape are also placed in digital space based on the AHD details relevant to the datum and blocked out shapes based on these markers are created to simulate shapes in the photo. Once the background photo is placed in the digital space, the view from the 3D virtual camera placed in the shot can be rotated until the markers and blocked out shapes align with the same points on the photo. When these match, the model of the proposed development is then assumed to be in correct location per the overall streetscape – because the neighbouring markers are matched up to the photo.

In this process, the only time where this may become affected, is if the background photo needs to be manipulated in any way – for example, if vertical lines are corrected for appearance purposes. As discussed before when a photo is taken at less than 20mm focal length, the objects on the extreme edges of the lens, as discussed earlier, start to appear as if they are leaning and straight lines may get curved because of the effect of the lens. On the other hand, when we see objects in the real world, our mind automatically corrects these effects and keeps straight lines straight. To simulate this, often the background photo is adjusted to remove the distorting effect of the wide lens.

The impact of this needs to be considered because on one side, the expectation from the visuals is that the photo should have minimum tweaking done to it, but, on the other hand, presenting a photo with buildings leaning over doesn’t look right. The compromise is to straighten the verticals and use the original photo to show a “before” and “after”, when the vertical lines have already been straightened in the photo.

However, for camera matching in 3D space – if the original photo had verticals corrected, but the surveyed markers that are positioned in 3D space are towards the outer edges of the frame, then we can’t expect them to match up unless the virtual camera also has vertical lines correct. Even then, though, the matching will most likely not be 100% accurate. In these cases, we need to keep this in mind and match the markers that are closest to the proposed building, because those will be less distorted in the photo and thus won’t be tweaked by Photoshop when the lines are being straightened.

In Summary

From some of the above factors, I hope it can be seen that accuracy in 3D photo-montages is not such a straightforward exercise and some assumptions need to be made in order to effectively use the 3D images as assessment tools for visual impact studies. As long as these assumptions are clearly dealt with in the statement of methodology reports and then understood by the stakeholders in mediation, the photo-montages can work to achieve their purpose.

Future technology advances may bring Virtual Reality and the ability to visualise developments in proper 3D space with headsets that will allow us to view them in full stereo, rather than compromising to a “before” and “after” scenario that, currently, is the extent that 3D photo-montages are able to be used.

Watch this space.

To your development success,


Stan Zaslavsky

LREA, BEng (Mech with Honours) / BTech (Industrial Design), VPELA

Continue reading next