• The goal of CCA is to find linear combinations of $\eta=\mathbf{a}^{\prime} \mathbf{x}$ and $\phi=\mathbf{b}^{\prime} \mathbf{y}$ such that $\eta$ and $\phi$ have the largest possible correlation.

  • CCA focuses on the relationship between two or more groups, whereas PCA focuses on the interrelationship within a set of variables.


Population CCA

Let $X \in \mathbb{R}^{n \times q}$ and $Y \in \mathbb{R}^{n \times p}$ are random vecotrs. Suppose that $X$ and $Y$ have means $\mu$ and $\nu$. And

\[\begin{gathered} E\{(\mathbf{x}-\boldsymbol{\mu})(\mathbf{x}-\boldsymbol{\mu})^{\prime}\}=\mathbf{\Sigma}_{11}\quad (q \times q)\\ \quad E\{(\mathbf{y}-\boldsymbol{v})(\mathbf{y}-\boldsymbol{v})^{\prime}\}=\mathbf{\Sigma}_{22} \quad (p \times p)\\ E\{(\mathbf{x}-\boldsymbol{\mu})(\mathbf{y}-\boldsymbol{v})^{\prime}\}=\mathbf{\Sigma}_{12}=\mathbf{\Sigma}_{21}^{\prime} \quad(q \times p) \end{gathered} \tag{1}\]

Let $\eta=\mathbf{a}^{\prime} \mathbf{x}$ and $\phi=\mathbf{b}^{\prime} \mathbf{y}$, then the correlation between $\eta$ and $\phi$ is

\[\begin{aligned} \rho\left(a, b\right) &= \frac{ Cov(\eta, \phi)}{\sqrt{Var(\eta)Var(\phi)}}\\ &= \frac{ Cov(a^{\prime}\mathbf{x}, b^{\prime}\mathbf{y})}{\sqrt{Var(a^{\prime}\mathbf{x})Var(b^{\prime}\mathbf{y})}} \\ &= \frac{ a^{\prime}Cov(\mathbf{x}, \mathbf{y})b}{\sqrt{a^{\prime}Var(\mathbf{x})ab^{\prime}Var(\mathbf{y})b}} \\ &=\frac{a^{\prime} \Sigma_{12} b}{\left(a^{\prime} \Sigma_{11} a b^{\prime} \Sigma_{22} b\right)^{1 / 2}} \\ \end{aligned} \tag{2}\]

To maximize $\rho(a,b)$, we impose the constraint that $Var(a^{\prime}\mathbf{x})$ and $Var(b^{\prime}\mathbf{y})$ are equal to 1, i.e. \(\mathbf{a}^{\prime} \mathbf{\Sigma}_{11} \mathbf{a}=\mathbf{b}^{\prime} \mathbf{\Sigma}_{22} \mathbf{b}=1\).

Hence,

\[\max \rho(a,b) \equiv \max _{a, b} \mathbf{a}^{\prime}\Sigma_{12} \mathbf{b},\quad \text { subject to } \mathbf{a}^{\prime} \mathbf{\Sigma}_{11} \mathbf{a}=\mathbf{b}^{\prime} \mathbf{\Sigma}_{22} \mathbf{b}=1 \tag{3}\]

Prerequisite Theorem

Theorem 1 For $\mathbf{A}(n \times p)$ and $\mathbf{B}(p \times n)$, the non-zero eigenvalues of $\mathbf{A B}$ and $\mathbf{B} \mathbf{A}$ are the same and have the same multiplicity. If $\mathbf{x}$ is a non-trivial eigenvector of $\mathbf{A} \mathbf{B}$ for an eigenvalue $\lambda \neq 0$, then $\mathbf{y}=\mathbf{B} \mathbf{x}$ is a non-trivial eigenvector of $\mathbf{B A}$.

Corollary 1 For $\mathbf{A}(n \times p), \mathbf{B}(q \times n), \mathbf{a}(p \times 1)$, and $\mathbf{b}(q \times 1)$, the matrix $\mathbf{A} \mathbf{a b}^{\prime} \mathbf{B}$ has rank at most 1 . The non-zero eigenvalue, if present, equals $\mathbf{b}^{\prime} \mathbf{B A a}$, with eigenvector $\mathbf{Aa}$.

Theorem 2 Spectral decomposition theorem

Any symmetric matrix $\mathbf{A}(p \times p)$ can be written as

\[\mathbf{A}=\Gamma \Lambda \Gamma^{\prime}=\sum \lambda_{i} \gamma_{(i)} \gamma_{(i) s}^{\prime}\]

where $\mathbf{\Lambda}$ is a diagonal matrix of eigenvalues of $\mathbf{A}$, and $\Gamma$ is an orthogonal matrix whose columns are standardized eigenvectors.

Theorem 3 Singular value decomposition theorem

If $\mathbf{A}$ is an $(n \times p)$ matrix of rank $r$, then $\mathbf{A}$ can be written as

\[\mathbf{A}=\mathbf{U L} \mathbf{V}^{\prime}\]

where $\mathbf{U}(n \times r)$ and $\mathbf{V}(p \times r)$ are column orthonormal matrices $\left(\mathbf{U} \mathbf{U}^{\prime}=\right.$ $\mathbf{V}^{\prime} \mathbf{V}=I_{r}$ ) and $\mathbf{L}$ is a diagonal matrix with positive elements.

Theorem 4 Let $\mathbf{A}$ and $\mathbf{B}$ be two symmetric matrices. Suppose that $\mathbf{B}>0$. Then the maximum (minimum) of $\mathbf{x}^{\prime} \mathbf{A} \mathbf{x}$ given

\[\mathbf{x}^{\prime} \mathbf{B} \mathbf{x}=1\]

is attained when $\mathbf{x}$ is the eigenvector of $\mathbf{B}^{-1} \mathbf{A}$ corresponding to the largest (smallest) eigenvalue of $\mathbf{B}^{-1} \mathbf{A}$. Thus if $\lambda_{1}$ and $\lambda_{p}$ are the largest and smallest eigenvalues of $\mathbf{B}^{-1} \mathbf{A}$, then, subject to the constraint (6),

\[\max _{\mathbf{x}} \mathbf{x}^{\prime} \mathbf{A} \mathbf{x}=\lambda_{\mathrm{1}}, \quad \min _{\mathbf{x}} \mathbf{x}^{\prime} \mathbf{A} \mathbf{x}=\lambda_{p}\]

Theorem 5 If $\alpha=\mathrm{a}^{\prime} \mathrm{x}$ is a standardized linear combination ($ S L C$) of $\mathrm{x}$ which is uncorrelated with the first $k$ principal components of $\mathbf{x}$, then the variance of $\alpha$ is maximized when $\alpha$ is the $(k+1)$ th principal component of $\mathbf{x}$.

Definition

We first need to define some notations. Suppose that \(\mathbf{\Sigma}_{11}\) and \(\mathbf{\Sigma}_{22}\) are non-singular and let

\[\begin{aligned} \mathbf{N}_{1}&=\mathbf{K} \mathbf{K}^{\prime} = \Sigma_{11}^{-1/2} \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \Sigma_{11}^{-1/2}, \\ \mathbf{N}_{2}&=\mathbf{K}^{\prime} \mathbf{K} =\Sigma_{22}^{-1/2} \Sigma_{21} \Sigma_{11}^{-1} \Sigma_{12} \Sigma_{22}^{-1/2}, \end{aligned} \tag{4}\]

where

\[\mathbf{K}=\Sigma_{11}^{-1 / 2} \Sigma_{12} \Sigma_{22}^{-1 / 2}.\tag{5}\]

By Theorem 1, we get that \(\mathbf{N}_1\) and \(\mathbf{N}_2\) both have the same non-zero eigenvalues. Furthermore, since \(\mathbf{N}_1\) and \(\mathbf{N}_2\) are positive semi-definite matrix, \(\lambda_{1} \geq \ldots \geq \lambda_{k}>0\), where \(k=\mathrm{r}\left(\Sigma_{12}\right) \leq \min (p, q)\).

Using the SVD and spectral decomposition theorem, we can decompose $\mathbf{K}$ As the following,

\[\mathbf{K}=\left(\boldsymbol{\alpha}_{1}, \ldots, \boldsymbol{\alpha}_{k}\right) \mathbf{D}\left(\boldsymbol{\beta}_{1}, \ldots, \boldsymbol{\beta}_{k}\right)^{\prime} \tag{6}\]

where \(\boldsymbol{\alpha}_{i}\) and \(\boldsymbol{\beta}_{i}\) are the standardized eigenvectors w.r.t. \(\mathbf{N}_{1}\) and \(\mathbf{N}_{2}\), and \(\mathbf{D}=\operatorname{diag}\left(\lambda_{1}^{1 / 2}, \ldots, \lambda_{k}^{1 / 2}\right)\). Furthermore, the eigenvectors are orthogonal:

\[\boldsymbol{\alpha}_{i}^{\prime} \boldsymbol{\alpha}_{j}=\boldsymbol{\beta}_{i}^{\prime} \boldsymbol{\beta}_{j}=\delta_{i j}= \begin{cases}1 & \text { if } i=j \\ 0 & \text { if } i \neq j\end{cases} \tag{7}\]

Hence we have the following definitions:

  • The $i_{th}$ canonical correlation vectors for \(\mathbf{x}\) and \(\mathbf{y}\) are \(\mathbf{a}_{i}\) and \(\mathbf{b}_{i}\), respectively, where

    \[\mathbf{a}_{i}=\mathbf{\Sigma}_{11}^{-1 / 2} \boldsymbol{\alpha}_{i,}, \mathbf{b}_{i}=\mathbf{\Sigma}_{22}^{-1 / 2} \boldsymbol{\beta}_{i}, \quad i=1, \ldots, k \tag{8}\]
  • The $i_{th}$ canonical correlation variables are \(\eta_i=\mathbf{a}_i^{\prime} \mathbf{x}\) and \(\phi_i=\mathbf{b}_i^{\prime} \mathbf{y}\). Note that \(\operatorname{Cov}\left(\eta_{i}, \eta_{j}\right)=\operatorname{Cov}\left(\phi_{i}, \phi_{j}\right)=\boldsymbol{\alpha}_{i}^{\prime} \boldsymbol{\alpha}_{j}=\boldsymbol{\beta}_{i}^{\prime} \boldsymbol{\beta}_{j}= \delta_{ij}\).

  • The $i_{th}$ canonical correlation coefficient is \(\rho_{i}=\lambda_{i}^{1 / 2}\).

Properties

Property 1 The solution to the maximization problem stated in (3) \(\max _{a, b} \mathbf{a}^{\prime}\Sigma_{12} \mathbf{b}\) is given by \(\rho_r\) and attained when \(\mathbf{a}=\mathbf{a}_{r}, \mathbf{b}=\mathbf{b}_{r}\) for \(1 \leq r \leq k\).

Note that another constraint is added: \(\mathbf{a}_{i}^{\prime} \mathbf{\Sigma}_{11} \mathbf{a}=0, \quad i=1, \ldots, r-1\).

This is the largest correlation between linear combinations $\mathbf{a}^{\prime} \mathbf{x}$ and $\mathbf{b}^{\prime} \mathbf{y}$ subject to the restriction that $\mathbf{a}^{\prime} \mathbf{x}$ is uncorrelated with the first $r-1$ canonical correlation variables for $\mathbf{x}$. Here we restate the problem with a theorem:

Theorem 6

Fix $r, 1\leq r\leq k$ and let

\[\begin{aligned} &f_{r}=\max _{\mathbf{a,b}} \mathbf{a}^{\prime} \mathbf{\Sigma}_{12} \mathbf{b} \\ &\text{subject to}\\ &\mathbf{a}^{\prime} \mathbf{\Sigma}_{11} \mathbf{a}=1, \quad \mathbf{b}^{\prime} \mathbf{\Sigma}_{22} \mathbf{b}=1, \quad \mathbf{a}_{i}^{\prime} \mathbf{\Sigma}_{11} \mathbf{a}=0, \quad i=1, \ldots, r-1 . \end{aligned} \tag{9}\]

Then the maximum is given by \(f_{r}=\rho_{r}\) and is attained when \(\mathbf{a}=\mathbf{a}_{r}, \mathbf{b}=\mathbf{b}_{r}\).

Proof

Since the sign of $\mathbf{a}^{\prime}\Sigma_{12} \mathbf{b}$ is irrelevant, it is convenient to solve (9) for $f_{r}^2$ instead of $f_{r}$.

  1. Fix $\mathbf{a}$ and maximize over $\mathbf{b}$, $f_{r}^2$ becomes:
\[\max _{\mathbf{b}}\left(\mathbf{a}^{\prime} \mathbf{\Sigma}_{12} \mathbf{b}\right)^{2}=\max _{\mathbf{b}} \mathbf{b}^{\prime} \mathbf{\Sigma}_{21} \mathbf{a a}^{\prime} \mathbf{\Sigma}_{12} \mathbf{b} \text { subject to } \mathbf{b}^{\prime} \mathbf{\Sigma}_{22} \mathbf{b}=1 \tag{10}\]

By Theorem 4, we have this maximum is attained by the maximum eigenvalue of $\Sigma_{22}^{-1} \Sigma_{21} a^{\prime} \Sigma_{12}$. By Corollary 1, let

\[\mathbf{A}_{p \times p} = \Sigma_{22}^{-1}, \mathbf{B}_{q \times p} = \Sigma_{12}, \mathbf{c}_{p \times 1} = \Sigma_{12}\mathbf{a}, \text{ and } \mathbf{d}_{q \times 1} = \mathbf{a}.\]

Then $\Sigma_{22}^{-1} \Sigma_{21} a^{\prime} \Sigma_{12} = \mathbf{Acd^{\prime}B}$.

This matrix is at most rank 1 and its eigenvalue equals to $\mathbf{d^{\prime}BAc}$ and its coresponding eigenvector is $\mathbf{Ac}$.

Hence,

\[\max _{\mathbf{b}}\left(\mathbf{a}^{\prime} \mathbf{\Sigma}_{12} \mathbf{b}\right)^{2}=\mathbf{d^{\prime}BAc} = \mathbf{a}^{\prime} \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \mathbf{a},\tag{11}\]

and \(\arg \max_{\mathbf{b}}\left(\mathbf{a}^{\prime} \mathbf{\Sigma}_{12} \mathbf{b}\right)^{2} = \mathbf{Ac} = \Sigma_{22}^{-1}\Sigma_{12}\mathbf{a}\).

  1. Maximize over $\mathbf{a}$ subject to the constraint (9). Setting $\boldsymbol{\alpha}=\mathbf{\Sigma}_{11}^{1 / 2} \mathbf{a}$, that is, $\mathbf{a}=\mathbf{\Sigma}^{-1 / 2} \alpha$, the problem (10) becomes
\[\begin{aligned} &\max _{\mathbf{a}} \mathbf{a}^{\prime} \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \mathbf{a} \\ \equiv & \max _{\mathbf{\alpha}} \mathbf{\alpha}^{\prime} \Sigma_{11}^{-1/2} \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21}\Sigma_{11}^{-1/2}\mathbf{\alpha} \\ \equiv & \max _{\mathbf{\alpha}} \boldsymbol{\alpha}^{\prime} \mathbf{N}_{1} \boldsymbol{\alpha} \text { subject to } \boldsymbol{\alpha}^{\prime} \boldsymbol{\alpha}=1, \boldsymbol{\alpha}_{i}^{\prime} \boldsymbol{\alpha}=0, \quad i=1, \ldots, r-1 \end{aligned} \tag{12}\]

Note that \(\mathbf{a}_{i}=\mathbf{\Sigma}^{-1 / 2} \boldsymbol{\alpha}_{i}\) is the ith canonical correlation vector, and the \(\mathbf{\alpha_i}\) are the eigenvectors on \(\mathbf{N}_1\) corresponding to the \((r-1)\) largest eigenvalues of \(\mathbf{N}_1\). Then by Theorem 5, the maximum of (12) is attained when \(\mathbf{\alpha} = \mathbf{\alpha}_r\), or equivalently \(\mathbf{a}=\mathbf{a}_{r}\).

Then

\[f_{r}^{2}=\boldsymbol{\alpha}_{r}^{\prime} \mathbf{N}_{1} \boldsymbol{\alpha}_{r}=\lambda_{r} \boldsymbol{\alpha}_{r}^{\prime} \boldsymbol{\alpha}_{\mathrm{r}}=\lambda_{r}\]
  1. Therefore, from the decomposition (6), we have $\mathbf{K} \beta_{r}=\lambda_{r}^{1/2} \alpha_{r} = \rho_r \alpha_r$. Substituting $\mathbf{a}_r$ and $\mathbf{b}_r$, we have
\[\mathbf{a}_{r}^{\prime} \Sigma_{12} \mathbf{b}_{r}= \mathbf{\alpha}^{\prime} \Sigma_{11}^{-1/2} \Sigma_{12} \Sigma_{22}^{-1/2}\beta_r= \alpha_{r}^{\prime} \mathbf{K} \boldsymbol{\beta}_{r}=\rho_{r} \boldsymbol{\alpha}_{r}^{\prime} \boldsymbol{\alpha}_{r}=\rho_{r} \quad \Box\]

A weaker but more symmetric form of the problem can be described as following:

Theorem 7 Let

\[g_{r}=\max _{a, b} a^{\prime} \mathbf{\Sigma}_{12} \mathbf{b}\]

subject to

\[\mathbf{a}^{\prime} \mathbf{\Sigma}_{11} \mathbf{a}=\mathbf{b}^{\prime} \mathbf{\Sigma}_{22} \mathbf{b}=1, \quad \mathbf{a}_{i}^{\prime} \Sigma_{11} \mathbf{a}=\mathbf{b}_{i}^{\prime} \mathbf{\Sigma}_{22} \mathbf{b}=0, \quad i=1, \ldots, r-1 .\]

Then $g_{r}=\rho_{r}$ and the maximum is attained when \(\mathbf{a}=\mathbf{a}_{r}\) and \(\mathbf{b}=\mathbf{b}_{r}\).

Property 2 The correlation matrix of canonical correlation variables is formed by the square root of the eigenvalues in matrix $\mathbf{D}$:

\[V\left(\begin{array}{l} \eta \\ \phi \end{array}\right)=\left(\begin{array}{cc} \mathbf{I} & \Lambda^{1 / 2} \\ \Lambda^{1 / 2} & \mathbf{I} \end{array}\right)\]

It is easy to prove this correlation matrix. Since we have \(\operatorname{Cov}\left(\eta_{i}, \eta_{j}\right)=\operatorname{Cov}\left(\phi_{i}, \phi_{j}\right)=\boldsymbol{\alpha}_{i}^{\prime} \boldsymbol{\alpha}_{j}=\boldsymbol{\beta}_{i}^{\prime} \boldsymbol{\beta}_{j}= \delta_{ij}\), then \(V(\eta)=V(\phi)=\mathbf{I}\).

From theorem 6, we obtained the $i_{th}$ canonical correlation is \(\lambda_i^{1/2}\). And from the decomposition (6), we have \(C\left(\eta_{i}, \phi_{i}\right)=\mathbf{a}_{i}^{\prime} \Sigma_{12} \mathbf{b}_{i}=\boldsymbol{\alpha}_{i}^{\prime} \mathbf{K} \boldsymbol{\beta}_{i}=\lambda_{i} \boldsymbol{\alpha}_{i}^{\prime} \boldsymbol{\alpha}_{i}=0 .\)

Hence, we proved the correlation matrix of cc variables.

Property 3 CCA is scaling invariance.

Theorem 8 If \(\mathbf{x}^{*}=\mathbf{U}^{\prime} \mathbf{x}+\mathbf{u}\) and \(\mathbf{y}^{*}=\mathbf{V}^{\prime} \mathbf{y}+\mathbf{v}\), where \(U(q \times q)\) and \(V(p \times p)\) are non-singular matrices and \(\mathbf{u}(q \times 1), \mathbf{v}(p \times 1)\) are fixed vectors, then

(a) the canonical correlations between \(\mathbf{x}^{*}\) and \(\mathbf{y}^{*}\) are the same as those between \(\mathbf{x}\) and \(\mathbf{y}\);

(b) the canonical correlation vectors for \(\mathrm{x}^{*}\) and \(\mathrm{y}^{*}\) are given by \(\mathrm{a}_{i}^{*}=\) \(\mathbf{U}^{-1} \mathbf{a}_{i}\) and \(\mathbf{b}_{i}^{*}=\mathbf{V}^{-1} \mathbf{b}_{i}, i=1, \ldots, k\), where \(\mathbf{a}_{i}\) and \(\mathbf{b}_{i}\) are the canonical correlation vectors for \(\mathbf{x}\) and \(\mathbf{y}\).

Sample CCA

CCA can be applied to sample estimates \(\mathbf{S}\) for \(\boldsymbol{\Sigma}_{11}, \boldsymbol{\Sigma}_{12}\), and \(\boldsymbol{\Sigma}_{22}\). If the data is normal, then \(\mathbf{S}\) Is the MLE of \(\boldsymbol{\Sigma}\), so the sample CCA values are MLE of the corresponding population values.

\[\begin{aligned} \mathbf{S}_{11} &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{X}_{i} \mathbf{X}_{i}^{T}-\overline{\mathbf{X}} \overline{\mathbf{X}}^{T}, & \mathbf{S}_{22} &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{Y}_{i} \mathbf{Y}_{i}^{T}-\overline{\mathbf{Y}} \overline{\mathbf{Y}}^{T} \\ \mathbf{S}_{12} &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{X}_{i} \mathbf{Y}_{i}^{T}-\overline{\mathbf{X}} \overline{\mathbf{Y}}^{T}, & \mathbf{S}_{21} &=\frac{1}{n} \sum_{i=1}^{n} \mathbf{Y}_{i} \mathbf{X}_{i}^{T}-\overline{\mathbf{Y}} \overline{\mathbf{X}}^{T} \end{aligned}\]

Different Views in CCA

A geometric view in CCA is to minimize the anger, \(\theta \in [0, \frac{\pi}{2}]\), of the images \(\mathbf{z}_{a}\) and \(\mathbf{z}_{b}\) in the space \(\mathbb{R}^{n}\), where

\[X_{a} \mathbf{w}_{a}=\mathbf{z}_{a} \quad \text { and } \quad X_{b} \mathbf{w}_{b}=\mathbf{z}_{b}\]

This is equivalent to maximize the cosine of the angle between \(\mathbf{z}_{a}\) and \(\mathbf{z}_{b}\), By assumming \(\mathbf{z}_{a}\) and \(\mathbf{z}_{b}\) are unit norm, we have the following maximization problem for the first pair of canonical correlation variates:

\[\begin{gathered} \cos \theta_{1}=\max _{\mathbf{z}_{a}, \mathbf{z}_{b} \in \mathbb{R}^{n}}\left\langle\mathbf{z}_{a}, \mathbf{z}_{b}\right\rangle, \\ \left\|\mathbf{z}_{a}\right\|_{2}=1 \quad\left\|\mathbf{z}_{b}\right\|_{2}=1 \end{gathered} \tag{13}\]

The second pair of cc variates is found in the orthogonal complements of the first pair of cc variates. The procudure is ended by no more pairs are found. Therefore, we defined the problem as following by assuming $\mathbf{X}$s are centered and scaled:

\[\begin{gathered} \cos \theta_{r}=\max _{\mathbf{z}_{a}, \mathbf{z}_{b} \in \mathbb{R}^{n}}\left\langle\mathbf{z}_{a}^{r}, \mathbf{z}_{b}^{r}\right\rangle = \max _{\mathbf{w}_a, \mathbf{w}_b} \mathbf{w}_a^{r\prime}X_a^{\prime}X_b\mathbf{w}_b^r =\max_{\mathbf{w}_a, \mathbf{w}_b} \mathbf{w}_{a}^{r \prime} S_{a b} \mathbf{w}_{b}^r, \\ \left\|\mathbf{z}_{a}^{r}\right\|_{2}^2=\mathbf{w}_{a}^{r \prime} X_{a}^{\prime} X_{a} \mathbf{w}_{a}^r=\mathbf{w}_{a}^{r\prime} S_{a a} \mathbf{w}_{a}^r= 1 \\ \quad\left\|\mathbf{z}_{b}^{r}\right\|_{2}^2=\mathbf{w}_{b}^{r \prime} X_{b}^{\prime} X_{b} \mathbf{w}_{b}^r=\mathbf{w}_{b}^{r\prime} S_{b b} \mathbf{w}_{b}^r=1 ,\\ \left\langle\mathbf{z}_{a}^{r}, \mathbf{z}_{a}^{j}\right\rangle=0 \quad\left\langle\mathbf{z}_{b}^{r}, \mathbf{z}_{b}^{j}\right\rangle=0, \\ \forall j \neq r: \quad j, r=1,2, \ldots, \min (p, q) . \end{gathered}\]

where $\mathbf{S}$ is the sample estimates defined in previous section.

Implement CCA in R

R has its own base function for doing CCA: cancor(). We can also write our own function to implement CCA:

CC = function(X, Y, xscale = T, yscale = T){
  if(xscale == T){
    X = scale(X)
  }
  if(yscale == T){
    Y = scale(Y)
  }
  v.x = var(X); v.y = var(Y); c.xy = cov(X,Y)
  ## sqrt inverse matrices via SVD
  svd.x = svd(v.x); svd.y = svd(v.y)
  inv.x = svd.x$v %*% (t(svd.x$u) * 1/sqrt(svd.x$d))
  inv.y = svd.y$v %*% (t(svd.y$u) * 1/sqrt(svd.y$d))
  ## calculate K matrix and SVD
  K = inv.x %*% c.xy %*% inv.y
  svd.K = svd(K)
  ## Get CC coefficients and correlations
  d = svd.K$d
  xcoef = inv.x %*% svd.K$u
  ycoef = inv.y %*% svd.K$v
  return(list(corr = d, xcoef = xcoef, ycoef = ycoef))
}

Here we use a toy data LifeCycleSavings to illustrate the results:

## Prepare X and Y
pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]
## Run CCA without scale
CC(pop, oec, xscale = F, yscale = F)
#$corr
#[1] 0.8247966 0.3652762
#
#$xcoef
#            [,1]      [,2]
#[1,] -0.06377599 0.2535544
#[2,]  0.34053260 1.8221811
#
#$ycoef
#             [,1]          [,2]
#[1,] 0.0592971550 -0.2336554912
#[2,] 0.0009151786  0.0005311762
#[3,] 0.0291942000  0.0858752749

## Run CCA with scale
CC(pop, oec, xscale = T, yscale = T)
#$corr
#[1] 0.8247966 0.3652762
#
#$xcoef
#           [,1]     [,2]
#[1,] -0.5836605 2.320461
#[2,]  0.4395497 2.352019
#
#$ycoef
#           [,1]       [,2]
#[1,] 0.26567538 -1.0468717
#[2,] 0.90682202  0.5263260
#[3,] 0.08378358  0.2464509

We can see that CCA is invariant.


This post is mainly refer to the Chapter 10 Conanical Correlation Analysis of Multivariate Analysis by K V Mardia, J T Kent and J M Bibby.

Reference

Mardia, Kanti V., et al. “10 Canonical Correlation Analysis.” Multivariate Analysis, Acad. Pr., London U.a., 1992, pp. 281–299.

Uurtio, Viivi, et al. “A Tutorial on Canonical Correlation Methods.” ACM Computing Surveys, vol. 50, no. 6, 2018, pp. 1–33., https://doi.org/10.1145/3136624.