Our sensory systems provide complementary information about the multimodal objects and events that are the target of perception in everyday life. Professional musicians' specialization in the auditory domain is reflected in the morphology of their brains, which has distinctive characteristics, particularly in areas related to auditory and audio-motor activity. Here, we combined diffusion tensor imaging (DTI) with a behavioral measure of visually induced gain in pitch discrimination, and we used measures of cortical thickness (CT) correlations to assess how auditory specialization and musical expertise are reflected in the structural architecture of white and grey matter relevant to audiovisual processing. Across all participants (n,=,45), we found a correlation (p,$<$ 0.001) between reliance on visual cues in pitch discrimination and the fractional anisotropy (FA) in the left inferior fronto-occipital fasciculus (IFOF), a structure connecting visual and auditory brain areas. Group analyses also revealed greater cortical thickness correlation between visual and auditory areas in non-musicians (n,=,28) compared to musicians (n,=,17), possibly reflecting musicians' auditory specialization (FDR,$<$,10%). Our results corroborate and expand current knowledge of functional specialization with a specific focus on audition, and highlight the fact that perception is essentially multimodal while uni-sensory processing is a specialized task.