
    QdQ                         d Z ddlZddlmZ ddlmZ ddlmZmZm	Z	m
Z
 ddlmZmZ  G d de	          Z G d	 d
e          ZdS )u\  
Translation model that reorders output words based on their type and
distance from other related words in the output sentence.

IBM Model 4 improves the distortion model of Model 3, motivated by the
observation that certain words tend to be re-ordered in a predictable
way relative to one another. For example, <adjective><noun> in English
usually has its order flipped as <noun><adjective> in French.

Model 4 requires words in the source and target vocabularies to be
categorized into classes. This can be linguistically driven, like parts
of speech (adjective, nouns, prepositions, etc). Word classes can also
be obtained by statistical methods. The original IBM Model 4 uses an
information theoretic approach to group words into 50 classes for each
vocabulary.

Terminology
-----------

:Cept:
    A source word with non-zero fertility i.e. aligned to one or more
    target words.
:Tablet:
    The set of target word(s) aligned to a cept.
:Head of cept:
    The first word of the tablet of that cept.
:Center of cept:
    The average position of the words in that cept's tablet. If the
    value is not an integer, the ceiling is taken.
    For example, for a tablet with words in positions 2, 5, 6 in the
    target sentence, the center of the corresponding cept is
    ceil((2 + 5 + 6) / 3) = 5
:Displacement:
    For a head word, defined as (position of head word - position of
    previous cept's center). Can be positive or negative.
    For a non-head word, defined as (position of non-head word -
    position of previous word in the same tablet). Always positive,
    because successive words in a tablet are assumed to appear to the
    right of the previous word.

In contrast to Model 3 which reorders words in a tablet independently of
other words, Model 4 distinguishes between three cases.

1. Words generated by NULL are distributed uniformly.
2. For a head word t, its position is modeled by the probability
   d_head(displacement | word_class_s(s),word_class_t(t)),
   where s is the previous cept, and word_class_s and word_class_t maps
   s and t to a source and target language word class respectively.
3. For a non-head word t, its position is modeled by the probability
   d_non_head(displacement | word_class_t(t))

The EM algorithm used in Model 4 is:

:E step: In the training data, collect counts, weighted by prior
         probabilities.

         - (a) count how many times a source language word is translated
               into a target language word
         - (b) for a particular word class, count how many times a head
               word is located at a particular displacement from the
               previous cept's center
         - (c) for a particular word class, count how many times a
               non-head word is located at a particular displacement from
               the previous target word
         - (d) count how many times a source word is aligned to phi number
               of target words
         - (e) count how many times NULL is aligned to a target word

:M step: Estimate new probabilities based on the counts from the E step

Like Model 3, there are too many possible alignments to consider. Thus,
a hill climbing approach is used to sample good candidates.

Notations
---------

:i: Position in the source sentence
     Valid values are 0 (for NULL), 1, 2, ..., length of source sentence
:j: Position in the target sentence
     Valid values are 1, 2, ..., length of target sentence
:l: Number of words in the source sentence, excluding NULL
:m: Number of words in the target sentence
:s: A word in the source language
:t: A word in the target language
:phi: Fertility, the number of target words produced by a source word
:p1: Probability that a target word produced by a source word is
     accompanied by another target word that is aligned to NULL
:p0: 1 - p1
:dj: Displacement, Δj

References
----------

Philipp Koehn. 2010. Statistical Machine Translation.
Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and
Robert L. Mercer. 1993. The Mathematics of Statistical Machine
Translation: Parameter Estimation. Computational Linguistics, 19 (2),
263-311.
    Ndefaultdict)	factorial)AlignedSent	AlignmentIBMModel	IBMModel3)Countslongest_target_sentence_lengthc                   ^     e Zd ZdZ	 d
 fd	Z fdZd Zd Zd Zd Z	e
d	             Z xZS )	IBMModel4u  
    Translation model that reorders output words based on their type and
    their distance from other related words in the output sentence

    >>> bitext = []
    >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
    >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
    >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
    >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
    >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
    >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
    >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
    >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
    >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
    >>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 }
    >>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }

    >>> ibm4 = IBMModel4(bitext, 5, src_classes, trg_classes)

    >>> print(round(ibm4.translation_table['buch']['book'], 3))
    1.0
    >>> print(round(ibm4.translation_table['das']['book'], 3))
    0.0
    >>> print(round(ibm4.translation_table['ja'][None], 3))
    1.0

    >>> print(round(ibm4.head_distortion_table[1][0][1], 3))
    1.0
    >>> print(round(ibm4.head_distortion_table[2][0][1], 3))
    0.0
    >>> print(round(ibm4.non_head_distortion_table[3][6], 3))
    0.5

    >>> print(round(ibm4.fertility_table[2]['summarize'], 3))
    1.0
    >>> print(round(ibm4.fertility_table[1]['book'], 3))
    1.0

    >>> print(round(ibm4.p1, 3))
    0.033

    >>> test_sentence = bitext[2]
    >>> test_sentence.words
    ['das', 'buch', 'ist', 'ja', 'klein']
    >>> test_sentence.mots
    ['the', 'book', 'is', 'small']
    >>> test_sentence.alignment
    Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])

    Nc                 *   t                                          |           |                                  || _        || _        |Vt          ||          }|j        | _        |j        | _        |j        | _        |j	        | _	        | 
                    |           nN|d         | _        |d         | _        |d         | _        |d         | _	        |d         | _        |d         | _        t          d|          D ]}|                     |           dS )	a  
        Train on ``sentence_aligned_corpus`` and create a lexical
        translation model, distortion models, a fertility model, and a
        model for generating NULL-aligned words.

        Translation direction is from ``AlignedSent.mots`` to
        ``AlignedSent.words``.

        :param sentence_aligned_corpus: Sentence-aligned parallel corpus
        :type sentence_aligned_corpus: list(AlignedSent)

        :param iterations: Number of iterations to run training algorithm
        :type iterations: int

        :param source_word_classes: Lookup table that maps a source word
            to its word class, the latter represented by an integer id
        :type source_word_classes: dict[str]: int

        :param target_word_classes: Lookup table that maps a target word
            to its word class, the latter represented by an integer id
        :type target_word_classes: dict[str]: int

        :param probability_tables: Optional. Use this to pass in custom
            probability values. If not specified, probabilities will be
            set to a uniform distribution, or some other sensible value.
            If specified, all the following entries must be present:
            ``translation_table``, ``alignment_table``,
            ``fertility_table``, ``p1``, ``head_distortion_table``,
            ``non_head_distortion_table``. See ``IBMModel`` and
            ``IBMModel4`` for the type and purpose of these tables.
        :type probability_tables: dict[str]: object
        Ntranslation_tablealignment_tablefertility_tablep1head_distortion_tablenon_head_distortion_tabler   )super__init__reset_probabilitiessrc_classestrg_classesr	   r   r   r   r   set_uniform_probabilitiesr   r   rangetrain)	selfsentence_aligned_corpus
iterationssource_word_classestarget_word_classesprobability_tablesibm3n	__class__s	           3lib/python3.11/site-packages/nltk/translate/ibm4.pyr   zIBMModel4.__init__   s*   P 	0111  """..%4jAAD%)%;D"#'#7D #'#7D gDG**+BCCCC &88K%LD"#56G#HD #56G#HD (.DG);<S)TD&-?+.D* q*%% 	0 	0AJJ.////	0 	0    c                      t                                                       t           fd           _        	 t           fd           _        d S )Nc                  (    t           fd          S )Nc                  (    t           fd          S )Nc                       j         S NMIN_PROBr   s   r&   <lambda>zSIBMModel4.reset_probabilities.<locals>.<lambda>.<locals>.<lambda>.<locals>.<lambda>   s	    DM r'   r   r/   s   r&   r0   zAIBMModel4.reset_probabilities.<locals>.<lambda>.<locals>.<lambda>   s    4I4I4I4I(J(J r'   r   r/   s   r&   r0   z/IBMModel4.reset_probabilities.<locals>.<lambda>   s    K J J J JKK r'   c                  (    t           fd          S )Nc                       j         S r,   r-   r/   s   r&   r0   zAIBMModel4.reset_probabilities.<locals>.<lambda>.<locals>.<lambda>   s	     r'   r   r/   s   r&   r0   z/IBMModel4.reset_probabilities.<locals>.<lambda>   s    K 5 5 5 566 r'   )r   r   r   r   r   r   r%   s   `r&   r   zIBMModel4.reset_probabilities   sg    ##%%%%0KKKK&
 &
"	 *56666*
 *
&	 	r'   c                    t          |          }|dk    rt          j        ndd|dz
  z  z  t          j        k     r't          j        dt          |          z   dz              t          d|          D ]l}t          fd          | j        |<   t          fd          | j        | <   t          fd          | j	        |<   t          fd          | j	        | <   md	S )
zj
        Set distortion probabilities uniformly to
        1 / cardinality of displacement values
              zA target sentence is too long (z& words). Results may be less accurate.c                  (    t           fd          S )Nc                       S r,    initial_probs   r&   r0   zGIBMModel4.set_uniform_probabilities.<locals>.<lambda>.<locals>.<lambda>      L r'   r   r:   s   r&   r0   z5IBMModel4.set_uniform_probabilities.<locals>.<lambda>      $8$8$8$899 r'   c                  (    t           fd          S )Nc                       S r,   r9   r:   s   r&   r0   zGIBMModel4.set_uniform_probabilities.<locals>.<lambda>.<locals>.<lambda>  r<   r'   r   r:   s   r&   r0   z5IBMModel4.set_uniform_probabilities.<locals>.<lambda>  r=   r'   c                       S r,   r9   r:   s   r&   r0   z5IBMModel4.set_uniform_probabilities.<locals>.<lambda>   s    \ r'   c                       S r,   r9   r:   s   r&   r0   z5IBMModel4.set_uniform_probabilities.<locals>.<lambda>!  s    l r'   N)
r   r   r.   warningswarnstrr   r   r   r   )r   r   max_mdjr;   s       @r&   r   z#IBMModel4.set_uniform_probabilities  s+   
 //FGG A::#,LLUQY0L(+++M1e**:;   5// 	T 	TB-89999. .D&r* /:9999/ /D&s+ 2==Q=Q=Q=Q1R1RD*2.2=>R>R>R>R2S2SD*B3//	T 	Tr'   c           
          t                      }|D ]}t          |j                  }|                     |          \  }}t	          |                                          |_        |                     |          }|D ]}|                     |          }	|	|z  }
t          d|dz             D ]<}|
                    |
||           |                    |
||| j        | j                   =|                    |
|           |                    |
|           | j        }|                                  || _        |                     |           |                     |           |                     |           |                     |           d S )Nr5   )Model4Countslenwordssampler   zero_indexed_alignment	alignmentprob_of_alignmentsprob_t_a_given_sr   update_lexical_translationupdate_distortionr   r   update_null_generationupdate_fertilityr   r   *maximize_lexical_translation_probabilities!maximize_distortion_probabilities maximize_fertility_probabilities&maximize_null_generation_probabilities)r   parallel_corpuscountsaligned_sentencemsampled_alignmentsbest_alignmenttotal_countalignment_infocountnormalized_countjexisting_alignment_tables                r&   r   zIBMModel4.train#  s    / 	J 	J$*++A 26=M1N1N.)25577* *&
 112DEEK #5 J J--n==#(;#6 q!a% 
 
A55(.!   ,,(&((    --.>OOO''(8.IIII#J* $(#7   """777???..v666--f55533F;;;;;r'   c                    | j         }|j                                        D ]z\  }}|                                D ]`\  }}|D ]X}|j        |         |         |         |j        |         |         z  }t	          |t
          j                  ||         |         |<   Ya{| j        }	|j                                        D ]N\  }}|D ]F}|j        |         |         |j	        |         z  }t	          |t
          j                  |	|         |<   GOd S r,   )
r   head_distortionitemshead_distortion_for_any_djmaxr   r.   r   non_head_distortionnon_head_distortion_for_any_dj)
r   rY   head_d_tablerF   r   s_clsr   t_clsestimatenon_head_d_tables
             r&   rU   z+IBMModel4.maximize_distortion_probabilitiesQ  s]   1%5;;== 	V 	VOB&1&7&7&9&9 V V"{( V VE.r259%@ ;EB5IJ  698CT5U5UL$U+E22VV  9%9??AA 	O 	OOB$ O O.r259;EBC  /2(H<M.N.N $U++O	O 	Or'   c                 8    t                               ||           S )zc
        Probability of target sentence and an alignment given the
        source sentence
        )r   model4_prob_t_a_given_s)r   r_   s     r&   rO   zIBMModel4.prob_t_a_given_se  s    
 00FFFr'   c                 Z    d}t           j         fd} fd} fd} fd}| |            z  }|k     rS | |            z  }|k     rS t          dt           j                            D ]2}| ||          z  }|k     rc S | ||          z  }|k     rc S 3|S )N      ?c                  .   d} j         }d|z
  }                    d          }t          j                  dz
  }| t	          ||          t	          ||d|z  z
            z  z  } | k     rS t          d|dz             D ]}| ||z
  |z
  dz   |z  z  } | S )Nrs   r5   r   r6   )r   fertility_of_irI   trg_sentencepowr   )	valuer   p0null_fertilityr[   ir.   r_   	ibm_models	         r&   null_generation_termz?IBMModel4.model4_prob_t_a_given_s.<locals>.null_generation_termq  s    EBRB+::1==NN/0014AS^,,s2q1~;M7M/N/NNNEx 1nq011 : :!n,q01499Lr'   c                      d} j         }t          dt          |                    D ]M}                    |          }| t	          |          j        |         ||                  z  z  } | k     rc S N| S )Nrs   r5   )src_sentencer   rI   ru   r   r   )rx   r   r{   	fertilityr.   r_   r|   s       r&   fertility_termz9IBMModel4.model4_prob_t_a_given_s.<locals>.fertility_term  s    E)6L1c,//00 $ $*99!<<	i((/	:<?KL 8###OOO $Lr'   c                 x    j         |          }j        |          }j        |         }j        |         |         S r,   )rv   rM   r   r   )rb   tr{   sr_   r|   s       r&   lexical_translation_termzCIBMModel4.model4_prob_t_a_given_s.<locals>.lexical_translation_term  s=    +A.A(+A+A.A.q1!44r'   c                    	j         |          }	j        |          }|dk    rdS 	                    |           rq	                    |           }d }|	j        |         }
j        |         }
j        |         }| 	                    |          z
  }
j        |         |         |         S 		                    |           }
j        |         }| |z
  }
j
        |         |         S )Nr   rs   )rv   rM   is_head_wordprevious_ceptr   r   r   center_of_ceptr   previous_in_tabletr   )rb   r   r{   r   	src_class
previous_s	trg_classrF   previous_positionr_   r|   s            r&   distortion_termz:IBMModel4.model4_prob_t_a_given_s.<locals>.distortion_term  s    +A.A(+AAvvs**1-- 	Q . < <Q ? ? 	 ,!/!<]!KJ ) 5j AI%1!4	66}EEE 6r:9EiPP !/ A A! D D!-a0I&&B6r:9EEr'   r5   )r   r.   r   rI   rv   )	r_   r|   probabilityr}   r   r   r   rb   r.   s	   ``      @r&   rq   z!IBMModel4.model4_prob_t_a_given_sl  se   $	 	 	 	 	 	 	 	 	 	 	 	 	 		5 	5 	5 	5 	5 	5	F 	F 	F 	F 	F 	F6 	++---!!O~~'''!!Oq#n9::;; 	  	 A33A666KX%%??1---KX%% & r'   r,   )__name__
__module____qualname____doc__r   r   r   r   rU   rO   staticmethodrq   __classcell__r%   s   @r&   r   r   v   s        1 1r  A0 A0 A0 A0 A0 A0F    ( T  T  TD,< ,< ,<\O O O(G G G S S \S S S S Sr'   r   c                   (     e Zd ZdZ fdZd Z xZS )rH   zp
    Data object to store counts of various parameters during training.
    Includes counts for distortion.
    c                     t                                                       t          d           | _        t          d           | _        t          d           | _        t          d           | _        d S )Nc                  "    t          d           S )Nc                  "    t          d           S )Nc                      dS Ng        r9   r9   r'   r&   r0   zKModel4Counts.__init__.<locals>.<lambda>.<locals>.<lambda>.<locals>.<lambda>  s    C r'   r   r9   r'   r&   r0   z9Model4Counts.__init__.<locals>.<lambda>.<locals>.<lambda>  s    KK(@(@ r'   r   r9   r'   r&   r0   z'Model4Counts.__init__.<locals>.<lambda>  s    K @ @AA r'   c                  "    t          d           S )Nc                      dS r   r9   r9   r'   r&   r0   z9Model4Counts.__init__.<locals>.<lambda>.<locals>.<lambda>  s    RU r'   r   r9   r'   r&   r0   z'Model4Counts.__init__.<locals>.<lambda>  s    k++>V>V r'   c                  "    t          d           S )Nc                      dS r   r9   r9   r'   r&   r0   z9Model4Counts.__init__.<locals>.<lambda>.<locals>.<lambda>  s    3 r'   r   r9   r'   r&   r0   z'Model4Counts.__init__.<locals>.<lambda>  s    {;;7O7O r'   c                      dS r   r9   r9   r'   r&   r0   z'Model4Counts.__init__.<locals>.<lambda>  s    # r'   )r   r   r   re   rg   ri   rj   r3   s    r&   r   zModel4Counts.__init__  ss    *AA 
  
 +66V6V*W*W'#./O/O#P#P .9++.F.F+++r'   c                 2   |j         |         }|j        |         }|dk    rd S |                    |          r|                    |          }||j        |         }	||	         }
nd }
||         }||                    |          z
  }| j        |         |
         |xx         |z  cc<   | j        |
         |xx         |z  cc<   d S |                    |          }||         }||z
  }| j	        |         |xx         |z  cc<   | j
        |xx         |z  cc<   d S )Nr   )rM   rv   r   r   r   r   re   rg   r   ri   rj   )r   r`   r_   rb   r   r   r{   r   r   previous_src_wordr   r   rF   
previous_js                 r&   rQ   zModel4Counts.update_distortion  s^   $Q''*66D((++ 	D*88;;M($2$?$N!'(9:		 	#AI^22=AAAB $Y/	:::eC:::+I6yAAAUJAAAAA (::1==J#AIZB$R(333u<333/	:::eC:::::r'   )r   r   r   r   r   rQ   r   r   s   @r&   rH   rH     s]         
G G G G GD D D D D D Dr'   rH   )r   rB   collectionsr   mathr   nltk.translater   r   r   r	   nltk.translate.ibm_modelr
   r   r   rH   r9   r'   r&   <module>r      s   d dL  # # # # # #       F F F F F F F F F F F F K K K K K K K KJ J J J J J J JZ
'D 'D 'D 'D 'D6 'D 'D 'D 'D 'Dr'   