New
What is Hyperspace Analogue to Language (HAL) model?

with 2 comments

Also known as semantic memory, it was developed by Kevin Lund and Curt Burgress from the University of California, Riverside, California in 1996. The basic premise the work relies on is that words with similar meaning repeatedly occur closely (also known as co-occurrence). As an example in a large corpus of text one could expect to see the words mouse, dog and cat appear often close to each other. The same might be true for Japan, Malaysia and Singapore.

One could now create a co-occurence square matrix of a text where all unique words n are represented as a row and column. So we will have a matrix of the size n x n. For each word, a, we count the number of times every other word, b, occurs in close proximity to a. the counting is actually done using weighted co-occurences. For example, if b occurs adjacent to a, it receives a weighting of 5. If b is separated from a by one word, it receives a weighting of 4, and so forth on down to a weighting of 1 for distance 5-neighbours. This is called a simple sliding window parsing of size 5.
mouse dog cat Japan Malaysia Singapore
mouse 0 3 4 0 0 0
dog 3 0 5 0 0 0
cat 4 5 0 0 0 0
Japan 0 0 0 0 4 3
Malaysia 0 0 0 4 0 5
Singapore 0 0 0 3 5 0









As a result words co-occurring have similar rows. If we look at the simplified example in the above matrix we can see that mouse, cat and dog have similar rows and so do Japan, Malaysia and Singapore.

The “distance” between vectors then becomes a proxy for the similarity of meanings of the words represented by the vectors. The “distance” often is measured as the cosine of the angle between two vectors. As a result identical vectors, pointing in the same direction, have an angle of 0 degrees and a cosine value of 1. Unrelated vectors would be orthogonal with an angle of 90 degrees and a cosine value of 0. To ease the cosine calculation matrices are often normalised along the rows to the unit length of 1 of the row vectors.

Following the example it also shows that even words not directly co-occurring can share meaning. Dog for example does not appear close to mouse but through its shared meaning with cat also shares meaning with mouse. As a result one can easily group words by their meaning even if they share it only indirectly.

While similar experiments had been done before Lund and Burgess published their work it still was a great breakthrough. Their approach is completely automated and opposite to earlier work does not rely on humans selecting dimensions and training semantic vector spaces. Only the information in a corpus is used to create the matrix and the resulting vector space and thus has no external bias through influence by human actors.
Nov 1, 3:09pm

The Hyperspace Analogue to Language (HAL) model, also known as semantic memory, was developed by Kevin Lund and Curt Burgess from the University of California, Riverside, in 1996. The model is based on the premise that words with similar meanings tend to co-occur frequently in close proximity within a large corpus of text.

Key Concepts of the HAL Model:

  1. Co-occurrence Matrix:

    • The HAL model creates a co-occurrence matrix where each unique word in the text is represented as both a row and a column.
    • The matrix size is n×nn \times nn×n, where nnn is the number of unique words.
    • For each word aaa, the model counts the number of times every other word bbb occurs in close proximity to aaa.
  2. Weighted Co-occurrences:

    • The counting is done using weighted co-occurrences. For example, if word bbb occurs adjacent to word aaa, it receives a weighting of 5. If bbb is separated from aaa by one word, it receives a weighting of 4, and so forth down to a weighting of 1 for distance 5 neighbors.
    • This is called a simple sliding window parsing of size 5.
  3. Example Matrix:

    mouse dog cat Japan Malaysia Singapore mouse 0 3 4 0 0 0 dog 3 0 5 0 0 0 cat 4 5 0 0 0 0 Japan 0 0 0 0 4 3 Malaysia 0 0 0 4 0 5 Singapore 0 0 0 3 5 0
  4. Similarity Measurement:

    • Words that co-occur frequently have similar rows in the matrix.
    • The "distance" between vectors (rows) becomes a proxy for the similarity of meanings of the words represented by the vectors.
    • The distance is often measured as the cosine of the angle between two vectors. Identical vectors have an angle of 0 degrees and a cosine value of 1, while unrelated vectors are orthogonal with an angle of 90 degrees and a cosine value of 0.
  5. Normalization:

    • To ease the cosine calculation, matrices are often normalized along the rows to the unit length of 1 of the row vectors.
  6. Indirect Meaning Sharing:

    • The model shows that even words not directly co-occurring can share meaning. For example, "dog" does not appear close to "mouse," but through its shared meaning with "cat," it also shares meaning with "mouse."

Significance:

  • The HAL model was a significant breakthrough because it is completely automated and does not rely on humans selecting dimensions and training semantic vector spaces.
  • It uses only the information in a corpus to create the matrix and the resulting vector space, thus avoiding external bias through human influence.

Comments:

  1. Automation and Objectivity:

    • One of the key strengths of the HAL model is its automation. By relying solely on the corpus data, it avoids the subjectivity that can arise from human-selected dimensions. This makes the model more objective and consistent.
  2. Scalability and Applicability:

    • The HAL model is highly scalable and can be applied to large corpora of text. Its ability to capture semantic relationships without manual intervention makes it a powerful tool for natural language processing tasks such as word sense disambiguation, semantic similarity measurement, and information retrieval.

In summary, the HAL model is a pioneering approach in semantic memory that leverages co-occurrence patterns to automatically derive meaningful relationships between words, making it a valuable tool in the field of computational linguistics.

Nov 1, 3:09pm
When building the matrix on a corpus, and a word A is found twice, the first time right after the word B and the second time, three word later, what value will you enter in the matrix?
Nov 1, 3:25pm

When building the co-occurrence matrix in the HAL model and a word AAA is found twice in relation to another word BBB, you need to consider the weighted co-occurrences based on the proximity of the words.

Here's how you would handle the scenario:

  1. First Occurrence:

    • If word AAA is found right after word BBB, it receives a weighting of 5 (since it is adjacent).
  2. Second Occurrence:

    • If word AAA is found three words after word BBB, it receives a weighting of 3 (since it is separated by two words).

To determine the value to enter in the matrix, you sum the weighted co-occurrences for each position.

Example Calculation:

  • First Occurrence: Weight = 5
  • Second Occurrence: Weight = 3

The total weight for the co-occurrence of AAA and BBB would be the sum of these weights:

Total Weight=5+3=8 \text{Total Weight} = 5 + 3 = 8Total Weight=5+3=8

So, you would enter the value 8 in the matrix cell corresponding to the co-occurrence of words AAA and BBB.

Matrix Update:

If AAA is the row and BBB is the column, you would update the matrix as follows:

Matrix[A][B]=8 \text{Matrix}[A][B] = 8Matrix[A][B]=8

Similarly, if BBB is the row and AAA is the column, you would update the matrix as follows:

Matrix[B][A]=8 \text{Matrix}[B][A] = 8Matrix[B][A]=8

This ensures that the matrix captures the cumulative weighted co-occurrences of words AAA and BBB based on their proximity in the text.

Nov 1, 3:25pm
And how do we turn the element of the row in vectors?
Nov 1, 4:05pm

To turn the elements of the rows in the co-occurrence matrix into vectors, you essentially treat each row as a vector in a high-dimensional space. Here's a step-by-step process to convert the rows into vectors:

Step-by-Step Process:

  1. Construct the Co-occurrence Matrix:

    • Build the n×nn \times nn×n co-occurrence matrix as described earlier, where nnn is the number of unique words in the corpus.
    • Each cell in the matrix contains the weighted co-occurrence value between two words.
  2. Extract Rows as Vectors:

    • Each row in the matrix corresponds to a word and represents its co-occurrence relationship with all other words.
    • Treat each row as a vector in an nnn-dimensional space.

Example:

Let's consider a simplified example with a small vocabulary for clarity.

Vocabulary:

  • Words: {mouse, dog, cat, Japan, Malaysia, Singapore}

Co-occurrence Matrix:

mouse dog cat Japan Malaysia Singapore mouse 0 3 4 0 0 0 dog 3 0 5 0 0 0 cat 4 5 0 0 0 0 Japan 0 0 0 0 4 3 Malaysia 0 0 0 4 0 5 Singapore 0 0 0 3 5 0

Row Vectors:

  • Vector for "mouse":
Vectormouse=[0,3,4,0,0,0] \text{Vector}_{\text{mouse}} = [0, 3, 4, 0, 0, 0]Vectormouse​=[0,3,4,0,0,0]
  • Vector for "dog":
Vectordog=[3,0,5,0,0,0] \text{Vector}_{\text{dog}} = [3, 0, 5, 0, 0, 0]Vectordog​=[3,0,5,0,0,0]
  • Vector for "cat":
Vectorcat=[4,5,0,0,0,0] \text{Vector}_{\text{cat}} = [4, 5, 0, 0, 0, 0]Vectorcat​=[4,5,0,0,0,0]
  • Vector for "Japan":
VectorJapan=[0,0,0,0,4,3] \text{Vector}_{\text{Japan}} = [0, 0, 0, 0, 4, 3]VectorJapan​=[0,0,0,0,4,3]
  • Vector for "Malaysia":
VectorMalaysia=[0,0,0,4,0,5] \text{Vector}_{\text{Malaysia}} = [0, 0, 0, 4, 0, 5]VectorMalaysia​=[0,0,0,4,0,5]
  • Vector for "Singapore":
VectorSingapore=[0,0,0,3,5,0] \text{Vector}_{\text{Singapore}} = [0, 0, 0, 3, 5, 0]VectorSingapore​=[0,0,0,3,5,0]

Normalization (Optional but Common):

To ease the calculation of similarity measures like cosine similarity, it is common to normalize the vectors to unit length. This means each vector is scaled so that its length (Euclidean norm) is 1.

Normalization Formula:

Normalized Vector=Vector∥Vector∥ \text{Normalized Vector} = \frac{\text{Vector}}{\|\text{Vector}\|}Normalized Vector=∥Vector∥Vector​

Where ∥Vector∥\|\text{Vector}\|∥Vector∥ is the Euclidean norm (length) of the vector.

Example Normalization:

For the vector of "mouse":

Vectormouse=[0,3,4,0,0,0] \text{Vector}_{\text{mouse}} = [0, 3, 4, 0, 0, 0]Vectormouse​=[0,3,4,0,0,0]

The Euclidean norm is:

∥Vectormouse∥=02+32+42+02+02+02=0+9+16+0+0+0=25=5 \|\text{Vector}_{\text{mouse}}\| = \sqrt{0^2 + 3^2 + 4^2 + 0^2 + 0^2 + 0^2} = \sqrt{0 + 9 + 16 + 0 + 0 + 0} = \sqrt{25} = 5∥Vectormouse​∥=02+32+42+02+02+02​=0+9+16+0+0+0​=25​=5

The normalized vector is:

Normalized Vectormouse=[05,35,45,05,05,05]=[0,0.6,0.8,0,0,0] \text{Normalized Vector}_{\text{mouse}} = \left[ \frac{0}{5}, \frac{3}{5}, \frac{4}{5}, \frac{0}{5}, \frac{0}{5}, \frac{0}{5} \right] = [0, 0.6, 0.8, 0, 0, 0]Normalized Vectormouse​=[50​,53​,54​,50​,50​,50​]=[0,0.6,0.8,0,0,0]

Summary:

  • Each row in the co-occurrence matrix is treated as a vector.
  • These vectors represent the co-occurrence relationships of each word with all other words.
  • Optionally, normalize the vectors to unit length to facilitate similarity calculations.

By converting the rows into vectors, you can then use these vectors for various natural language processing tasks, such as calculating semantic similarity between words using cosine similarity or other distance metrics.

Nov 1, 4:05pm
How do you measure the angle between two vectors? as indicated in the text:
"The “distance” between vectors then becomes a proxy for the similarity of meanings of the words represented by the vectors. The “distance” often is measured as the cosine of the angle between two vectors. As a result identical vectors, pointing in the same direction, have an angle of 0 degrees and a cosine value of 1. Unrelated vectors would be orthogonal with an angle of 90 degrees and a cosine value of 0. To ease the cosine calculation matrices are often normalised along the rows to the unit length of 1 of the row vectors."
Nov 8, 5:24pm

To measure the angle between two vectors, you can use the cosine similarity, which is a common measure in vector space models like the HAL model. The cosine similarity between two vectors is the cosine of the angle between them. Here's how you can calculate it:

Cosine Similarity Formula:

The cosine similarity between two vectors A\mathbf{A}A and B\mathbf{B}B is given by:

cosine similarity=cos⁡(θ)=A⋅B∥A∥∥B∥ \text{cosine similarity} = \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}cosine similarity=cos(θ)=∥A∥∥B∥A⋅B​

Where:

  • A⋅B\mathbf{A} \cdot \mathbf{B}A⋅B is the dot product of vectors A\mathbf{A}A and B\mathbf{B}B.
  • ∥A∥\|\mathbf{A}\|∥A∥ is the Euclidean norm (length) of vector A\mathbf{A}A.
  • ∥B∥\|\mathbf{B}\|∥B∥ is the Euclidean norm (length) of vector B\mathbf{B}B.

Steps to Calculate Cosine Similarity:

  1. Dot Product:
    • Calculate the dot product of the two vectors.
A⋅B=∑i=1nAi⋅Bi \mathbf{A} \cdot \mathbf{B} = \sum_{i=1}^{n} A_i \cdot B_iA⋅B=i=1∑n​Ai​⋅Bi​
  1. Euclidean Norm:
    • Calculate the Euclidean norm of each vector.
∥A∥=∑i=1nAi2 \|\mathbf{A}\| = \sqrt{\sum_{i=1}^{n} A_i^2}∥A∥=i=1∑n​Ai2​​ ∥B∥=∑i=1nBi2 \|\mathbf{B}\| = \sqrt{\sum_{i=1}^{n} B_i^2}∥B∥=i=1∑n​Bi2​​
  1. Cosine Similarity:
    • Use the dot product and the Euclidean norms to calculate the cosine similarity.
cosine similarity=A⋅B∥A∥∥B∥ \text{cosine similarity} = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|}cosine similarity=∥A∥∥B∥A⋅B​

Example:

Let's calculate the cosine similarity between two vectors, A\mathbf{A}A and B\mathbf{B}B, from the previous example:

  • A=[0,3,4,0,0,0]\mathbf{A} = [0, 3, 4, 0, 0, 0]A=[0,3,4,0,0,0] (vector for "mouse")
  • B=[3,0,5,0,0,0]\mathbf{B} = [3, 0, 5, 0, 0, 0]B=[3,0,5,0,0,0] (vector for "dog")
  1. Dot Product:
A⋅B=(0⋅3)+(3⋅0)+(4⋅5)+(0⋅0)+(0⋅0)+(0⋅0)=0+0+20+0+0+0=20 \mathbf{A} \cdot \mathbf{B} = (0 \cdot 3) + (3 \cdot 0) + (4 \cdot 5) + (0 \cdot 0) + (0 \cdot 0) + (0 \cdot 0) = 0 + 0 + 20 + 0 + 0 + 0 = 20A⋅B=(0⋅3)+(3⋅0)+(4⋅5)+(0⋅0)+(0⋅0)+(0⋅0)=0+0+20+0+0+0=20
  1. Euclidean Norms:
∥A∥=02+32+42+02+02+02=0+9+16+0+0+0=25=5 \|\mathbf{A}\| = \sqrt{0^2 + 3^2 + 4^2 + 0^2 + 0^2 + 0^2} = \sqrt{0 + 9 + 16 + 0 + 0 + 0} = \sqrt{25} = 5∥A∥=02+32+42+02+02+02​=0+9+16+0+0+0​=25​=5 ∥B∥=32+02+52+02+02+02=9+0+25+0+0+0=34 \|\mathbf{B}\| = \sqrt{3^2 + 0^2 + 5^2 + 0^2 + 0^2 + 0^2} = \sqrt{9 + 0 + 25 + 0 + 0 + 0} = \sqrt{34}∥B∥=32+02+52+02+02+02​=9+0+25+0+0+0​=34​
  1. Cosine Similarity:
cosine similarity=205⋅34=205⋅5.83≈2029.15≈0.686 \text{cosine similarity} = \frac{20}{5 \cdot \sqrt{34}} = \frac{20}{5 \cdot 5.83} \approx \frac{20}{29.15} \approx 0.686cosine similarity=5⋅34​20​=5⋅5.8320​≈29.1520​≈0.686

Interpretation:

  • A cosine similarity of 1 indicates that the vectors are identical (angle of 0 degrees).
  • A cosine similarity of 0 indicates that the vectors are orthogonal (angle of 90 degrees), meaning they are unrelated.
  • A cosine similarity between 0 and 1 indicates the degree of similarity between the vectors.

Normalization:

If the vectors are already normalized to unit length (i.e., ∥A∥=1\|\mathbf{A}\| = 1∥A∥=1 and ∥B∥=1\|\mathbf{B}\| = 1∥B∥=1), the cosine similarity simplifies to just the dot product:

cosine similarity=A⋅B \text{cosine similarity} = \mathbf{A} \cdot \mathbf{B}cosine similarity=A⋅B

This is why normalizing the vectors to unit length is often done to ease the calculation of cosine similarity.

By measuring the cosine similarity, you can determine the semantic similarity between words based on their co-occurrence patterns in the HAL model.

Nov 8, 5:24pm
can you show me an example for the normalized vector calculation of cosine similarity?
Nov 8, 5:34pm

Certainly! Let's go through an example of calculating the cosine similarity between two normalized vectors.

Example:

Consider the following vectors for the words "mouse" and "dog" from the previous example:

  • A=[0,3,4,0,0,0]\mathbf{A} = [0, 3, 4, 0, 0, 0]A=[0,3,4,0,0,0] (vector for "mouse")
  • B=[3,0,5,0,0,0]\mathbf{B} = [3, 0, 5, 0, 0, 0]B=[3,0,5,0,0,0] (vector for "dog")

Step 1: Normalize the Vectors

First, we need to normalize these vectors to unit length.

Normalization of Vector A\mathbf{A}A:

  1. Calculate the Euclidean norm of A\mathbf{A}A:
∥A∥=02+32+42+02+02+02=0+9+16+0+0+0=25=5 \|\mathbf{A}\| = \sqrt{0^2 + 3^2 + 4^2 + 0^2 + 0^2 + 0^2} = \sqrt{0 + 9 + 16 + 0 + 0 + 0} = \sqrt{25} = 5∥A∥=02+32+42+02+02+02​=0+9+16+0+0+0​=25​=5
  1. Normalize A\mathbf{A}A by dividing each component by its norm:
Anormalized=[05,35,45,05,05,05]=[0,0.6,0.8,0,0,0] \mathbf{A}_{\text{normalized}} = \left[ \frac{0}{5}, \frac{3}{5}, \frac{4}{5}, \frac{0}{5}, \frac{0}{5}, \frac{0}{5} \right] = [0, 0.6, 0.8, 0, 0, 0]Anormalized​=[50​,53​,54​,50​,50​,50​]=[0,0.6,0.8,0,0,0]

Normalization of Vector B\mathbf{B}B:

  1. Calculate the Euclidean norm of B\mathbf{B}B:
∥B∥=32+02+52+02+02+02=9+0+25+0+0+0=34≈5.83 \|\mathbf{B}\| = \sqrt{3^2 + 0^2 + 5^2 + 0^2 + 0^2 + 0^2} = \sqrt{9 + 0 + 25 + 0 + 0 + 0} = \sqrt{34} \approx 5.83∥B∥=32+02+52+02+02+02​=9+0+25+0+0+0​=34​≈5.83
  1. Normalize B\mathbf{B}B by dividing each component by its norm:
Bnormalized=[35.83,05.83,55.83,05.83,05.83,05.83]≈[0.515,0,0.857,0,0,0] \mathbf{B}_{\text{normalized}} = \left[ \frac{3}{5.83}, \frac{0}{5.83}, \frac{5}{5.83}, \frac{0}{5.83}, \frac{0}{5.83}, \frac{0}{5.83} \right] \approx [0.515, 0, 0.857, 0, 0, 0]Bnormalized​=[5.833​,5.830​,5.835​,5.830​,5.830​,5.830​]≈[0.515,0,0.857,0,0,0]

Step 2: Calculate the Dot Product of the Normalized Vectors

Now, calculate the dot product of the normalized vectors Anormalized\mathbf{A}_{\text{normalized}}Anormalized​ and Bnormalized\mathbf{B}_{\text{normalized}}Bnormalized​:

Anormalized⋅Bnormalized=(0⋅0.515)+(0.6⋅0)+(0.8⋅0.857)+(0⋅0)+(0⋅0)+(0⋅0) \mathbf{A}_{\text{normalized}} \cdot \mathbf{B}_{\text{normalized}} = (0 \cdot 0.515) + (0.6 \cdot 0) + (0.8 \cdot 0.857) + (0 \cdot 0) + (0 \cdot 0) + (0 \cdot 0)Anormalized​⋅Bnormalized​=(0⋅0.515)+(0.6⋅0)+(0.8⋅0.857)+(0⋅0)+(0⋅0)+(0⋅0) =0+0+0.6856+0+0+0 = 0 + 0 + 0.6856 + 0 + 0 + 0=0+0+0.6856+0+0+0 =0.6856 = 0.6856=0.6856

Step 3: Calculate the Cosine Similarity

Since the vectors are already normalized, the cosine similarity is simply the dot product of the normalized vectors:

cosine similarity=Anormalized⋅Bnormalized=0.6856 \text{cosine similarity} = \mathbf{A}_{\text{normalized}} \cdot \mathbf{B}_{\text{normalized}} = 0.6856cosine similarity=Anormalized​⋅Bnormalized​=0.6856

Interpretation:

The cosine similarity of approximately 0.6856 indicates a moderate degree of similarity between the vectors for "mouse" and "dog". A value of 1 would indicate identical vectors, and a value of 0 would indicate orthogonal (unrelated) vectors.

Summary:

  1. Normalize the vectors to unit length.
  2. Calculate the dot product of the normalized vectors.
  3. The cosine similarity is the dot product of the normalized vectors.

This process allows you to measure the semantic similarity between words based on their co-occurrence patterns in the HAL model.

Nov 8, 5:34pm

This Chat is read-only. Login to resume chatting.

mouse dog cat Japan Malaysia Singapore mouse 0 3 4 0 0 0 dog 3 0 5 0 0 0 cat 4 5 0 0 0 0 Japan 0 0 0 0 4 3 Malaysia 0 0 0 4 0 5 Singapore 0 0 0 3 5 0
mouse dog cat Japan Malaysia Singapore mouse 0 3 4 0 0 0 dog 3 0 5 0 0 0 cat 4 5 0 0 0 0 Japan 0 0 0 0 4 3 Malaysia 0 0 0 4 0 5 Singapore 0 0 0 3 5 0