
    c0                        d Z ddlZddlZddlmZmZmZ ddlZ	ddl
mZ ddlmZmZ ddl	mZ ddlmZ 	 ddlmZmZmZmZmZ dd	lmZmZ n# e$ r ej        w xY w ej        e          Z G d
 de          Z  G d dej!                  Z" G d dej!                  Z#d Z$d#dZ%d#dZ&d$dZ'd Z(d%dZ) G d de          Z*d Z+d&dZ,dZ-dZ.d  Z/d! Z0dd"l1m2Z2 e*e2_*        dS )'a"  
Introduction
------------
Learn word representations via fastText: `Enriching Word Vectors with Subword Information
<https://arxiv.org/abs/1607.04606>`_.

This module allows training word embeddings from a training corpus with the additional ability to obtain word vectors
for out-of-vocabulary words.

This module contains a fast native C implementation of fastText with Python interfaces. It is **not** only a wrapper
around Facebook's implementation.

This module supports loading models trained with Facebook's fastText implementation.
It also supports continuing training from such models.

For a tutorial see :ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`.


Usage examples
--------------

Initialize and train a model:

.. sourcecode:: pycon

    >>> from gensim.models import FastText
    >>> from gensim.test.utils import common_texts  # some example sentences
    >>>
    >>> print(common_texts[0])
    ['human', 'interface', 'computer']
    >>> print(len(common_texts))
    9
    >>> model = FastText(vector_size=4, window=3, min_count=1)  # instantiate
    >>> model.build_vocab(corpus_iterable=common_texts)
    >>> model.train(corpus_iterable=common_texts, total_examples=len(common_texts), epochs=10)  # train

Once you have a model, you can access its keyed vectors via the `model.wv` attributes.
The keyed vectors instance is quite powerful: it can perform a wide range of NLP tasks.
For a full list of examples, see :class:`~gensim.models.keyedvectors.KeyedVectors`.

You can also pass all the above parameters to the constructor to do everything
in a single line:

.. sourcecode:: pycon

    >>> model2 = FastText(vector_size=4, window=3, min_count=1, sentences=common_texts, epochs=10)

The two models above are instantiated differently, but behave identically.
For example, we can compare the embeddings they've calculated for the word "computer":

.. sourcecode:: pycon

    >>> import numpy as np
    >>>
    >>> np.allclose(model.wv['computer'], model2.wv['computer'])
    True


In the above examples, we trained the model from sentences (lists of words) loaded into memory.
This is OK for smaller datasets, but for larger datasets, we recommend streaming the file,
for example from disk or the network.
In Gensim, we refer to such datasets as "corpora" (singular "corpus"), and keep them
in the format described in :class:`~gensim.models.word2vec.LineSentence`.
Passing a corpus is simple:

.. sourcecode:: pycon

    >>> from gensim.test.utils import datapath
    >>>
    >>> corpus_file = datapath('lee_background.cor')  # absolute path to corpus
    >>> model3 = FastText(vector_size=4, window=3, min_count=1)
    >>> model3.build_vocab(corpus_file=corpus_file)  # scan over corpus to build the vocabulary
    >>>
    >>> total_words = model3.corpus_total_words  # number of words in the corpus
    >>> model3.train(corpus_file=corpus_file, total_words=total_words, epochs=5)

The model needs the `total_words` parameter in order to
manage the training rate (alpha) correctly, and to give accurate progress estimates.
The above example relies on an implementation detail: the
:meth:`~gensim.models.fasttext.FastText.build_vocab` method
sets the `corpus_total_words` (and also `corpus_count`) model attributes.
You may calculate them by scanning over the corpus yourself, too.

If you have a corpus in a different format, then you can use it by wrapping it
in an `iterator <https://wiki.python.org/moin/Iterator>`_.
Your iterator should yield a list of strings each time, where each string should be a separate word.
Gensim will take care of the rest:

.. sourcecode:: pycon

    >>> from gensim.utils import tokenize
    >>> from gensim import utils
    >>>
    >>>
    >>> class MyIter:
    ...     def __iter__(self):
    ...         path = datapath('crime-and-punishment.txt')
    ...         with utils.open(path, 'r', encoding='utf-8') as fin:
    ...             for line in fin:
    ...                 yield list(tokenize(line))
    >>>
    >>>
    >>> model4 = FastText(vector_size=4, window=3, min_count=1)
    >>> model4.build_vocab(corpus_iterable=MyIter())
    >>> total_examples = model4.corpus_count
    >>> model4.train(corpus_iterable=MyIter(), total_examples=total_examples, epochs=5)

Persist a model to disk with:

.. sourcecode:: pycon

    >>> from gensim.test.utils import get_tmpfile
    >>>
    >>> fname = get_tmpfile("fasttext.model")
    >>>
    >>> model.save(fname)
    >>> model = FastText.load(fname)

Once loaded, such models behave identically to those created from scratch.
For example, you can continue training the loaded model:

.. sourcecode:: pycon

    >>> import numpy as np
    >>>
    >>> 'computation' in model.wv.key_to_index  # New word, currently out of vocab
    False
    >>> old_vector = np.copy(model.wv['computation'])  # Grab the existing vector
    >>> new_sentences = [
    ...     ['computer', 'aided', 'design'],
    ...     ['computer', 'science'],
    ...     ['computational', 'complexity'],
    ...     ['military', 'supercomputer'],
    ...     ['central', 'processing', 'unit'],
    ...     ['onboard', 'car', 'computer'],
    ... ]
    >>>
    >>> model.build_vocab(new_sentences, update=True)  # Update the vocabulary
    >>> model.train(new_sentences, total_examples=len(new_sentences), epochs=model.epochs)
    >>>
    >>> new_vector = model.wv['computation']
    >>> np.allclose(old_vector, new_vector, atol=1e-4)  # Vector has changed, model has learnt something
    False
    >>> 'computation' in model.wv.key_to_index  # Word is still out of vocab
    False

.. Important::
    Be sure to call the :meth:`~gensim.models.fasttext.FastText.build_vocab`
    method with `update=True` before the :meth:`~gensim.models.fasttext.FastText.train` method
    when continuing training.  Without this call, previously unseen terms
    will not be added to the vocabulary.

You can also load models trained with Facebook's fastText implementation:

.. sourcecode:: pycon

    >>> cap_path = datapath("crime-and-punishment.bin")
    >>> fb_model = load_facebook_model(cap_path)

Once loaded, such models behave identically to those trained from scratch.
You may continue training them on new data:

.. sourcecode:: pycon

    >>> 'computer' in fb_model.wv.key_to_index  # New word, currently out of vocab
    False
    >>> old_computer = np.copy(fb_model.wv['computer'])  # Calculate current vectors
    >>> fb_model.build_vocab(new_sentences, update=True)
    >>> fb_model.train(new_sentences, total_examples=len(new_sentences), epochs=model.epochs)
    >>> new_computer = fb_model.wv['computer']
    >>> np.allclose(old_computer, new_computer, atol=1e-4)  # Vector has changed, model has learnt something
    False
    >>> 'computer' in fb_model.wv.key_to_index  # New word is now in the vocabulary
    True

If you do not intend to continue training the model, consider using the
:func:`gensim.models.fasttext.load_facebook_vectors` function instead.
That function only loads the word embeddings (keyed vectors), consuming much less CPU and RAM:

.. sourcecode:: pycon

    >>> from gensim.test.utils import datapath
    >>>
    >>> cap_path = datapath("crime-and-punishment.bin")
    >>> wv = load_facebook_vectors(cap_path)
    >>>
    >>> 'landlord' in wv.key_to_index  # Word is out of vocabulary
    False
    >>> oov_vector = wv['landlord']  # Even OOV words have vectors in FastText
    >>>
    >>> 'landlady' in wv.key_to_index  # Word is in the vocabulary
    True
    >>> iv_vector = wv['landlady']

Retrieve the word-vector for vocab and out-of-vocab word:

.. sourcecode:: pycon

    >>> existent_word = "computer"
    >>> existent_word in model.wv.key_to_index
    True
    >>> computer_vec = model.wv[existent_word]  # numpy vector of a word
    >>>
    >>> oov_word = "graph-out-of-vocab"
    >>> oov_word in model.wv.key_to_index
    False
    >>> oov_vec = model.wv[oov_word]  # numpy vector for OOV word

You can perform various NLP word tasks with the model, some of them are already built-in:

.. sourcecode:: pycon

    >>> similarities = model.wv.most_similar(positive=['computer', 'human'], negative=['interface'])
    >>> most_similar = similarities[0]
    >>>
    >>> similarities = model.wv.most_similar_cosmul(positive=['computer', 'human'], negative=['interface'])
    >>> most_similar = similarities[0]
    >>>
    >>> not_matching = model.wv.doesnt_match("human computer interface tree".split())
    >>>
    >>> sim_score = model.wv.similarity('computer', 'human')

Correlation with human opinion on word similarity:

.. sourcecode:: pycon

    >>> from gensim.test.utils import datapath
    >>>
    >>> similarities = model.wv.evaluate_word_pairs(datapath('wordsim353.tsv'))

And on word analogies:

.. sourcecode:: pycon

    >>> analogies_result = model.wv.evaluate_word_analogies(datapath('questions-words.txt'))

    N)onesvstackfloat32)Word2Vec)KeyedVectorsprep_vectors)utils)
deprecated)train_batch_anyMAX_WORDS_IN_BATCHcompute_ngramscompute_ngrams_bytesft_hash_bytes)train_epoch_sgtrain_epoch_cbowc                   R    e Zd Zdddddddddddddd	dd
dedddddddedddf fd	Zd Z fdZddZ	 ddZ	d Z
 ed          d d            Ze ej        d          d!d                        Z ej        d          d!d            Z fdZe fd            Z fdZ xZS )"FastTextNr   d   g?      gMbP?   g-C6?g      ?   i  Tc                    t           j        | _        t           j        | _        || _        |
dk    rt          d          |
| _        ||k     rd}t          ||||          | _        t          dt                    | j        _        t          dt                    | j        _        t          t          |                               ||||||||||||	|||||||||||||           dS )uf#  Train, use and evaluate word representations learned using the method
        described in `Enriching Word Vectors with Subword Information <https://arxiv.org/abs/1607.04606>`_,
        aka FastText.

        The model can be stored/loaded via its :meth:`~gensim.models.fasttext.FastText.save` and
        :meth:`~gensim.models.fasttext.FastText.load` methods, or loaded from a format compatible with the
        original Fasttext implementation via :func:`~gensim.models.fasttext.load_facebook_model`.

        Parameters
        ----------
        sentences : iterable of list of str, optional
            Can be simply a list of lists of tokens, but for larger corpora,
            consider an iterable that streams the sentences directly from disk/network.
            See :class:`~gensim.models.word2vec.BrownCorpus`, :class:`~gensim.models.word2vec.Text8Corpus'
            or :class:`~gensim.models.word2vec.LineSentence` in :mod:`~gensim.models.word2vec` module for such
            examples. If you don't supply `sentences`, the model is left uninitialized -- use if you plan to
            initialize it in some other way.
        corpus_file : str, optional
            Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
            You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
            `corpus_file` arguments need to be passed (or none of them, in that case, the model is left
            uninitialized).
        min_count : int, optional
            The model ignores all words with total frequency lower than this.
        vector_size : int, optional
            Dimensionality of the word vectors.
        window : int, optional
            The maximum distance between the current and predicted word within a sentence.
        workers : int, optional
            Use these many worker threads to train the model (=faster training with multicore machines).
        alpha : float, optional
            The initial learning rate.
        min_alpha : float, optional
            Learning rate will linearly drop to `min_alpha` as training progresses.
        sg : {1, 0}, optional
            Training algorithm: skip-gram if `sg=1`, otherwise CBOW.
        hs : {1,0}, optional
            If 1, hierarchical softmax will be used for model training.
            If set to 0, and `negative` is non-zero, negative sampling will be used.
        seed : int, optional
            Seed for the random number generator. Initial vectors for each word are seeded with a hash of
            the concatenation of word + `str(seed)`. Note that for a fully deterministically-reproducible run,
            you must also limit the model to a single worker thread (`workers=1`), to eliminate ordering jitter
            from OS thread scheduling. (In Python 3, reproducibility between interpreter launches also requires
            use of the `PYTHONHASHSEED` environment variable to control hash randomization).
        max_vocab_size : int, optional
            Limits the RAM during vocabulary building; if there are more unique
            words than this, then prune the infrequent ones. Every 10 million word types need about 1GB of RAM.
            Set to `None` for no limit.
        sample : float, optional
            The threshold for configuring which higher-frequency words are randomly downsampled,
            useful range is (0, 1e-5).
        negative : int, optional
            If > 0, negative sampling will be used, the int for negative specifies how many "noise words"
            should be drawn (usually between 5-20).
            If set to 0, no negative sampling is used.
        ns_exponent : float, optional
            The exponent used to shape the negative sampling distribution. A value of 1.0 samples exactly in proportion
            to the frequencies, 0.0 samples all words equally, while a negative value samples low-frequency words more
            than high-frequency words. The popular default value of 0.75 was chosen by the original Word2Vec paper.
            More recently, in https://arxiv.org/abs/1804.04212, Caselles-Dupré, Lesaint, & Royo-Letelier suggest that
            other values may perform better for recommendation applications.
        cbow_mean : {1,0}, optional
            If 0, use the sum of the context word vectors. If 1, use the mean, only applies when cbow is used.
        hashfxn : function, optional
            Hash function to use to randomly initialize weights, for increased training reproducibility.
        iter : int, optional
            Number of iterations (epochs) over the corpus.
        trim_rule : function, optional
            Vocabulary trimming rule, specifies whether certain words should remain in the vocabulary,
            be trimmed away, or handled using the default (discard if word count < min_count).
            Can be None (min_count will be used, look to :func:`~gensim.utils.keep_vocab_item`),
            or a callable that accepts parameters (word, count, min_count) and returns either
            :attr:`gensim.utils.RULE_DISCARD`, :attr:`gensim.utils.RULE_KEEP` or :attr:`gensim.utils.RULE_DEFAULT`.
            The rule, if given, is only used to prune vocabulary during
            :meth:`~gensim.models.fasttext.FastText.build_vocab` and is not stored as part of themodel.

            The input parameters are of the following types:
                * `word` (str) - the word we are examining
                * `count` (int) - the word's frequency count in the corpus
                * `min_count` (int) - the minimum count threshold.

        sorted_vocab : {1,0}, optional
            If 1, sort the vocabulary by descending frequency before assigning word indices.
        batch_words : int, optional
            Target size (in words) for batches of examples passed to worker threads (and
            thus cython routines).(Larger batches will be passed if individual
            texts are longer than 10000 words, but the standard cython code truncates to that maximum.)
        min_n : int, optional
            Minimum length of char n-grams to be used for training word representations.
        max_n : int, optional
            Max length of char ngrams to be used for training word representations. Set `max_n` to be
            lesser than `min_n` to avoid char ngrams being used.
        word_ngrams : int, optional
            In Facebook's FastText, "max length of word ngram" - but gensim only supports the
            default of 1 (regular unigram word handling).
        bucket : int, optional
            Character ngrams are hashed into a fixed number of buckets, in order to limit the
            memory usage of the model. This option specifies the number of buckets used by the model.
            The default value of 2000000 consumes as much memory as having 2000000 more in-vocabulary
            words in your model.
        callbacks : :obj: `list` of :obj: `~gensim.models.callbacks.CallbackAny2Vec`, optional
            List of callbacks that need to be executed/run at specific stages during training.
        max_final_vocab : int, optional
            Limits the vocab to a target vocab size by automatically selecting
            ``min_count```.  If the specified ``min_count`` is more than the
            automatically calculated ``min_count``, the former will be used.
            Set to ``None`` if not required.
        shrink_windows : bool, optional
            New in 4.1. Experimental.
            If True, the effective window size is uniformly sampled from  [1, `window`]
            for each target word during training, to match the original word2vec algorithm's
            approximate weighting of context words by distance. Otherwise, the effective
            window size is always fixed to `window` words to either side.

        Examples
        --------
        Initialize and train a `FastText` model:

        .. sourcecode:: pycon

            >>> from gensim.models import FastText
            >>> sentences = [["cat", "say", "meow"], ["dog", "say", "woof"]]
            >>>
            >>> model = FastText(sentences, min_count=1)
            >>> say_vector = model.wv['say']  # get vector for word
            >>> of_vector = model.wv['of']  # get vector for out-of-vocab word

        Attributes
        ----------
        wv : :class:`~gensim.models.fasttext.FastTextKeyedVectors`
            This object essentially contains the mapping between words and embeddings. These are similar to
            the embedding computed in the :class:`~gensim.models.word2vec.Word2Vec`, however here we also
            include vectors for n-grams. This allows the model to compute embeddings even for **unseen**
            words (that do not exist in the vocabulary), as the aggregate of the n-grams included in the word.
            After training the model, this attribute can be used directly to query those embeddings in various
            ways. Check the module level docstring for some examples.

        r   zGGensim's FastText implementation does not yet support word_ngrams != 1.r   dtype)	sentencescorpus_fileworkersvector_sizeepochs	callbacksbatch_words	trim_rulesgalphawindowmax_vocab_sizemax_final_vocab	min_countsamplesorted_vocab	null_wordns_exponenthashfxnseedhsnegative	cbow_mean	min_alphashrink_windowsN)r	   call_on_class_onlyloadload_fasttext_formatr"   NotImplementedErrorword_ngramsFastTextKeyedVectorswvr   REALvectors_vocab_lockfvectors_ngrams_lockfsuperr   __init__)selfr   r   r%   r1   r    r&   r'   r*   r(   r:   r+   r0   r   r4   r2   r.   r3   r/   r!   r-   min_nmax_nr,   bucketr$   r#   r"   r)   r5   	__class__s                                 6lib/python3.11/site-packages/gensim/models/fasttext.pyrA   zFastText.__init__  s   b ,	$)$<!"! 	q%&oppp&5= 	F&{E5&II '+1D&9&9&9#'+AT':':':$h&&['Wbkq[IRT\ajp)?\['"x9 	' 	@ 	@ 	@ 	@ 	@    c                 ~   t          | j        j                  }t          | j                  }| j        j        }|dk    s
J d            |dk    s
J d            t	          dt
                    | j        _        t	          dt
                    | j        _        | j        r|| _	        | j
        r|| _        || _        d S )Nr   z.expected num_vectors to be initialized alreadyz-expected vocab_size to be initialized alreadyr   r   )lenr<   vectorsr    r   r=   r?   r>   r1   syn1r2   syn1neglayer1_size)rB   hidden_outputnum_vectors
vocab_sizer    s        rG   _init_post_loadzFastText._init_post_load  s    $'/**\\
g)QPP PPPPA~NNNNNN (,AT':':':$&*1D&9&9&9#7 	&%DI= 	)(DL&rH   c                     t          t          |                                            | j                                         dS )z;Clear any cached values that training may have invalidated.N)r@   r   _clear_post_trainr<   adjust_vectors)rB   rF   s    rG   rT   zFastText._clear_post_train  s9    h//111     rH   c                    |pt          | j                  }| j        t          j        t          j                  j        z  }| j        t          j        t          j                  j        z  }|pi }t          | j                  | j        rdndz  |d<   t          | j                  |z  |d<   | j        j	        }| j        rt          | j                  |z  |d<   | j
        rt          | j                  |z  |d<   | j        j	        r| j        j	        |z  |d<   d}| j        j        D ]D}t          || j        j        | j        j        | j        j	                  }|t          |          z  }Ed	d
t          | j                  z  z   d|z  z   |d<   t          |                                          |d<   t"                              dt          | j                  || j        |d                    |S )zUEstimate memory that will be needed to train a model, and print the estimates to log.i  i  vocab
syn0_vocabrL   rM   syn0_ngramsr   @   r      buckets_wordtotalzNestimated required memory for %i words, %i buckets and %i dimensions: %i bytes)rJ   r<   r    npr   r   itemsizerN   r1   rE   r2   key_to_indexft_ngram_hashesrC   rD   sumvaluesloggerinfo)	rB   rQ   reportvec_sizel1_sizenum_buckets
num_ngramswordhashess	            rG   estimate_memoryzFastText.estimate_memory  s   /3tw<<
#bhrz&:&:&CC"RXbj%9%9%BB2dg,,*A##cBw"47||h6|gn7 	4 \\G3F6N= 	7 #DGw 6F97> 		R$(GNX$=F=!J, * *(tw}dgmTW^\\c&kk)

 &(3TW+=%>!j.%QF>"fmmoo..w\LL+t'7	
 	
 	
 rH   c	                     |\  }
}| j         rt          | |||||||
|	  	        \  }}}nt          | |||||||
|	  	        \  }}}|||fS N)r%   r   r   )rB   r   	thread_idoffsetcython_vocabthread_private_mem	cur_epochtotal_examplestotal_wordskwargsworkneu1examplestally	raw_tallys                  rG   _do_train_epochzFastText._do_train_epoch  s     (
d7 	)7k6<NT_aegk* *&HeYY *:k6<NT_aegk* *&HeY 	))rH   c                 `    |\  }}t          | ||||          }||                     |          fS )a  Train a single batch of sentences. Return 2-tuple `(effective word count after
        ignoring unknown words and sentence length trimming, total word count)`.

        Parameters
        ----------
        sentences : iterable of list of str
            Can be simply a list of lists of tokens, but for larger corpora,
            consider an iterable that streams the sentences directly from disk/network.
            See :class:`~gensim.models.word2vec.BrownCorpus`, :class:`~gensim.models.word2vec.Text8Corpus`
            or :class:`~gensim.models.word2vec.LineSentence` in :mod:`~gensim.models.word2vec` module for such examples.
        alpha : float
            The current learning rate.
        inits : tuple of (:class:`numpy.ndarray`, :class:`numpy.ndarray`)
            Each worker's private work memory.

        Returns
        -------
        (int, int)
            Tuple of (effective word count after ignoring unknown words and sentence length trimming, total word count)

        )r   _raw_word_count)rB   r   r&   initsrx   ry   r{   s          rG   _do_train_jobzFastText._do_train_job  s;    , 
didCCd**95555rH   zGensim 4.0.0 implemented internal optimizations that make calls to init_sims() unnecessary. init_sims() is now obsoleted and will be completely removed in future versions. See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4Fc                 <    | j                             |           dS )a  
        Precompute L2-normalized vectors. Obsoleted.

        If you need a single unit-normalized vector for some key, call
        :meth:`~gensim.models.keyedvectors.KeyedVectors.get_vector` instead:
        ``fasttext_model.wv.get_vector(key, norm=True)``.

        To refresh norms after you performed some atypical out-of-band vector tampering,
        call `:meth:`~gensim.models.keyedvectors.KeyedVectors.fill_norms()` instead.

        Parameters
        ----------
        replace : bool
            If True, forget the original trained vectors and only keep the normalized ones.
            You lose information if you do this.

        )replaceN)r<   	init_sims)rB   r   s     rG   r   zFastText.init_sims  s#    . 	'*****rH   zuse load_facebook_vectors (to use pretrained embeddings) or load_facebook_model (to continue training with the loaded full model, more RAM) insteadutf8c                 $    t          ||          S )zDeprecated.

        Use :func:`gensim.models.fasttext.load_facebook_model` or
        :func:`gensim.models.fasttext.load_facebook_vectors` instead.

        encoding)load_facebook_model)cls
model_filer   s      rG   r8   zFastText.load_fasttext_format8  s     #:AAAArH   c                     t          | j        |          }|j                                        D ]\  }}t	          | ||           dS )zLoad data from a binary file created by Facebook's native FastText.

        Parameters
        ----------
        encoding : str, optional
            Specifies the encoding.

        r   N)_load_fasttext_format	file_name__dict__itemssetattr)rB   r   mattrvals        rG   load_binary_datazFastText.load_binary_dataF  sX     "$.8DDD))++ 	% 	%ID#D$$$$$	% 	%rH   c                 H     t          t          |           j        |i | dS )a  Save the Fasttext model. This saved model can be loaded again using
        :meth:`~gensim.models.fasttext.FastText.load`, which supports incremental training
        and getting vectors for out-of-vocabulary words.

        Parameters
        ----------
        fname : str
            Store the model to this file.

        See Also
        --------
        :meth:`~gensim.models.fasttext.FastText.load`
            Load :class:`~gensim.models.fasttext.FastText` model.

        N)r@   r   saverB   argsrw   rF   s      rG   r   zFastText.saveW  s-      	#h"D3F33333rH   c                 H     t          t          |           j        |ddi|S )a  Load a previously saved `FastText` model.

        Parameters
        ----------
        fname : str
            Path to the saved file.

        Returns
        -------
        :class:`~gensim.models.fasttext.FastText`
            Loaded model.

        See Also
        --------
        :meth:`~gensim.models.fasttext.FastText.save`
            Save :class:`~gensim.models.fasttext.FastText` model.

        rethrowT)r@   r   r7   )r   r   rw   rF   s      rG   r7   zFastText.loadi  s,    ( )uXs##($GGGGGrH   c                      t          t          |           j        |i | t          | d          r| j        | j        _        | `dS dS )XHandle special requirements of `.load()` protocol, usually up-converting older versions.rE   N)r@   r   _load_specialshasattrrE   r<   r   s      rG   r   zFastText._load_specials  sU    ,h,d=f===4"" 	![DGN	 	rH   )NNF)r   )__name__
__module____qualname__hashr   rA   rR   rT   rm   r}   r   r
   r   classmethodr	   r8   r   r   r7   r   __classcell__rF   s   @rG   r   r     s       !%4A!QT\aQ $!DqRS_eDQR^_ghpq4M_km!%dh@ h@ h@ h@ h@ h@T' ' '(! ! ! ! !
   @ .2* * * *"6 6 66 Z	^ 
+ + + 
+( U	N B B B	  [
B U	N % % %	 %4 4 4 4 4$ H H H H [H*        rH   r   c                       e Zd ZdZdS )FastTextVocabzmThis is a redundant class. It exists only to maintain backwards compatibility
    with older gensim versions.Nr   r   r   __doc__r   rH   rG   r   r     s        # # # #rH   r   c                       e Zd ZdZdS )FastTextTrainablesz7Obsolete class retained for backward-compatible load()sNr   r   rH   rG   r   r     s        AAAArH   r   c                     t          |           |k    r!t          d|t          |           fz            t          j        |t                    }| |dt          |           <   |S )z3Pad array with additional entries filled with ones.z5the new number of rows %i must be greater than old %ir   N)rJ   
ValueErrorr^   r   r=   )r   new_lennew_arrs      rG   	_pad_onesr     sg    
1vv fPT[]`ab]c]cSddeeeggT***GGGSVVGNrH   utf-8c                 &    t          | |d          S )a  Load the model from Facebook's native fasttext `.bin` output file.

    Notes
    ------
    Facebook provides both `.vec` and `.bin` files with their modules.
    The former contains human-readable vectors.
    The latter contains machine-readable vectors along with other model parameters.
    This function requires you to **provide the full path to the .bin file**.
    It effectively ignores the `.vec` output file, since it is redundant.

    This function uses the smart_open library to open the path.
    The path may be on a remote host (e.g. HTTP, S3, etc).
    It may also be gzip or bz2 compressed (i.e. end in `.bin.gz` or `.bin.bz2`).
    For details, see `<https://github.com/RaRe-Technologies/smart_open>`__.

    Parameters
    ----------
    model_file : str
        Path to the FastText output files.
        FastText outputs two model files - `/path/to/model.vec` and `/path/to/model.bin`
        Expected value for this example: `/path/to/model` or `/path/to/model.bin`,
        as Gensim requires only `.bin` file to the load entire fastText model.
    encoding : str, optional
        Specifies the file encoding.

    Examples
    --------

    Load, infer, continue training:

    .. sourcecode:: pycon

        >>> from gensim.test.utils import datapath
        >>>
        >>> cap_path = datapath("crime-and-punishment.bin")
        >>> fb_model = load_facebook_model(cap_path)
        >>>
        >>> 'landlord' in fb_model.wv.key_to_index  # Word is out of vocabulary
        False
        >>> oov_term = fb_model.wv['landlord']
        >>>
        >>> 'landlady' in fb_model.wv.key_to_index  # Word is in the vocabulary
        True
        >>> iv_term = fb_model.wv['landlady']
        >>>
        >>> new_sent = [['lord', 'of', 'the', 'rings'], ['lord', 'of', 'the', 'flies']]
        >>> fb_model.build_vocab(new_sent, update=True)
        >>> fb_model.train(sentences=new_sent, total_examples=len(new_sent), epochs=5)

    Returns
    -------
    gensim.models.fasttext.FastText
        The loaded model.

    See Also
    --------
    :func:`~gensim.models.fasttext.load_facebook_vectors` loads
    the word embeddings only.  Its faster, but does not enable you to continue
    training.

    Tr   
full_model)r   )pathr   s     rG   r   r     s    | !TJJJJrH   c                 4    t          | |d          }|j        S )ar  Load word embeddings from a model saved in Facebook's native fasttext `.bin` format.

    Notes
    ------
    Facebook provides both `.vec` and `.bin` files with their modules.
    The former contains human-readable vectors.
    The latter contains machine-readable vectors along with other model parameters.
    This function requires you to **provide the full path to the .bin file**.
    It effectively ignores the `.vec` output file, since it is redundant.

    This function uses the smart_open library to open the path.
    The path may be on a remote host (e.g. HTTP, S3, etc).
    It may also be gzip or bz2 compressed.
    For details, see `<https://github.com/RaRe-Technologies/smart_open>`__.

    Parameters
    ----------
    path : str
        The location of the model file.
    encoding : str, optional
        Specifies the file encoding.

    Returns
    -------
    gensim.models.fasttext.FastTextKeyedVectors
        The word embeddings.

    Examples
    --------

    Load and infer:

        >>> from gensim.test.utils import datapath
        >>>
        >>> cap_path = datapath("crime-and-punishment.bin")
        >>> fbkv = load_facebook_vectors(cap_path)
        >>>
        >>> 'landlord' in fbkv.key_to_index  # Word is out of vocabulary
        False
        >>> oov_vector = fbkv['landlord']
        >>>
        >>> 'landlady' in fbkv.key_to_index  # Word is in the vocabulary
        True
        >>> iv_vector = fbkv['landlady']

    See Also
    --------
    :func:`~gensim.models.fasttext.load_facebook_model` loads
    the full model, not just word embeddings, and enables you to continue
    model training.

    Fr   )r   r<   )r   r   r   s      rG   load_facebook_vectorsr     s!    j 'th5QQQJ=rH   Tc                 8   t          j        | d          5 }t          j        j                            |||          }ddd           n# 1 swxY w Y   t          |j        |j        |j	        |j
        t          |j        dk              t          |j        dk              |j        |j        |j        |j        |j                  }|j        |_        |j        |_        |j        |_        |j        |_        |                    dd           |j        j        d	         |_        |j                            |j                   |                    |j                   tA          |           |!                    d
d|j        j         d|j"                    |S )a  Load the input-hidden weight matrix from Facebook's native fasttext `.bin` output files.

    Parameters
    ----------
    model_file : str
        Full path to the FastText model file.
    encoding : str, optional
        Specifies the file encoding.
    full_model : boolean, optional
        If False, skips loading the hidden output matrix. This saves a fair bit
        of CPU time and RAM, but prevents training continuation.

    Returns
    -------
    :class: `~gensim.models.fasttext.FastText`
        The loaded model.

    rbr   Nr      )r    r'   r!   r2   r1   r%   rE   r*   r+   rC   rD   T)updater*   r   r8   zloaded z' weight matrix for fastText model from )msg)#r	   opengensimmodels_fasttext_binr7   r   dimwsepochnegintlossmodelrE   r*   tminnmaxnntokenscorpus_total_words	raw_vocabnwordsrQ   prepare_vocabvectors_ngramsshapenum_original_vectorsr<   init_post_loadrR   rO   _check_modeladd_lifecycle_eventname)r   r   r   finr   r   s         rG   r   r     s   & 
J	%	% \M',,S8PZ,[[\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Etwqv{qw!|x+sff  E  !yEkEO8EL|E 
tq111!"!1!7!:E	HA,---	!/***	_a&,__UXU]__     Ls   (A

AAc                    | j         j        | j         j        j        d         k    r*t	          d| j         j        d| j         j        d          t          | d          rQ| j        J| j         j        | j        j        d         k    r*t	          d| j         j        d| j         j        d          t          | j                   | j        k    r-t	          dt          | j                   d	| j        d
          t          | j                   | j	        k    r5t                              dt          | j                   | j	                   dS dS )zJModel sanity checks. Run after everything has been completely initialized.r   z.mismatch between vector size in model params (z) and model vectors ()rM   Nz) and trainables (z#mismatch between final vocab size (z' words), and expected number of words (z words)zPmismatch between final vocab size (%s words), and expected vocab size (%s words))r<   r    r   r   r   r   rM   rJ   r   rQ   rd   warning)r   s    rG   r   r   X  s_   t14.4Q77 
j   !$"5"5"5
 
 	
 q)  4qyq11 	*D$$$ad&9&9&9   14yyAH 
jAD				1888
 
 	
 14yyAL  
^IIq|	
 	
 	
 	
 	

 
rH   r   r   c                 ^    ||d}t           j        j                            | |||           dS )a\  Saves word embeddings to the Facebook's native fasttext `.bin` format.

    Notes
    ------
    Facebook provides both `.vec` and `.bin` files with their modules.
    The former contains human-readable vectors.
    The latter contains machine-readable vectors along with other model parameters.
    **This function saves only the .bin file**.

    Parameters
    ----------
    model : gensim.models.fasttext.FastText
        FastText model to be saved.
    path : str
        Output path and filename (including `.bin` extension)
    encoding : str, optional
        Specifies the file encoding. Defaults to utf-8.

    lr_update_rate : int
        This parameter is used by Facebook fasttext tool, unused by Gensim.
        It defaults to Facebook fasttext default value `100`.
        In very rare circumstances you might wish to fiddle with it.

    word_ngrams : int
        This parameter is used by Facebook fasttext tool, unused by Gensim.
        It defaults to Facebook fasttext default value `1`.
        In very rare circumstances you might wish to fiddle with it.

    Returns
    -------
    None
    )lr_update_rater:   N)r   r   r   r   )r   r   r   r   r:   fb_fasttext_parameterss         rG   save_facebook_modelr   x  s9    B 1?{[[
M$$UD2H(SSSSSrH   c                        e Zd Zdef fd	Ze fd            Z fdZd Z fdZ	 fdZ
d fd		Z fd
ZddZd Zd Zd Z xZS )r;   r   c                     t          t          |                               |||           || _        || _        || _        d| _        t          j        ||f|          | _	        d| _
        d| _        dS )a  Vectors and vocab for :class:`~gensim.models.fasttext.FastText`.

        Implements significant parts of the FastText algorithm.  For example,
        the :func:`word_vec` calculates vectors for out-of-vocabulary (OOV)
        entities.  FastText achieves this by keeping vectors for ngrams:
        adding the vectors for the ngrams of an entity yields the vector for the
        entity.

        Similar to a hashmap, this class keeps a fixed number of buckets, and
        maps all ngrams to buckets using a hash function.

        Parameters
        ----------
        vector_size : int
            The dimensionality of all vectors.
        min_n : int
            The minimum number of characters in an ngram
        max_n : int
            The maximum number of characters in an ngram
        bucket : int
            The number of buckets.
        count : int, optional
            If provided, vectors will be pre-allocated for at least this many vectors. (Otherwise
            they can be added later.)
        dtype : type, optional
            Vector dimensions will default to `np.float32` (AKA `REAL` in some Gensim code) unless
            another type is provided here.

        Attributes
        ----------
        vectors_vocab : np.array
            Each row corresponds to a vector for an entity in the vocabulary.
            Columns correspond to vector dimensions. When embedded in a full
            FastText model, these are the full-word-token vectors updated
            by training, whereas the inherited vectors are the actual per-word
            vectors synthesized from the full-word-token and all subword (ngram)
            vectors.
        vectors_ngrams : np.array
            A vector for each ngram across all entities in the vocabulary.
            Each row is a vector that corresponds to a bucket.
            Columns correspond to vector dimensions.
        buckets_word : list of np.array
            For each key (by its index), report bucket slots their subwords map to.

        )r    countr   Nr   T)r@   r;   rA   rC   rD   rE   r\   r^   zerosvectors_vocabr   compatible_hash)rB   r    rC   rD   rE   r   r   rF   s          rG   rA   zFastTextKeyedVectors.__init__  s}    \ 	"D))22{RW_d2eee

 Xuk&:%HHH"#rH   c                 F     t          t          |           j        |fi |S )a  Load a previously saved `FastTextKeyedVectors` model.

        Parameters
        ----------
        fname : str
            Path to the saved file.

        Returns
        -------
        :class:`~gensim.models.fasttext.FastTextKeyedVectors`
            Loaded model.

        See Also
        --------
        :meth:`~gensim.models.fasttext.FastTextKeyedVectors.save`
            Save :class:`~gensim.models.fasttext.FastTextKeyedVectors` model.

        )r@   r;   r7   )r   fname_or_handlerw   rF   s      rG   r7   zFastTextKeyedVectors.load  s+    ( 5u)3//4_OOOOOrH   c                 |    t          t          |           j        |i | t          | t                    st	          dt          |           z            t          | d          r	| j        du rt	          d          t          | d          s+t          | d          rt          dt                    | _
        t          | d	          s+t          | d
          rt          dt                    | _        t          | j
        j                  dk    rt          dt                    | _
        t          | j        j                  dk    rt          dt                    | _        t          | d          r| j        s|                                  t          | d          r| j        |                                  dS dS )r   z;Loaded object of type %s, not expected FastTextKeyedVectorsr   FzPre-gensim-3.8.x fastText models with nonstandard hashing are no longer compatible. Loading your old model into gensim-3.8.3 & re-saving may create a model compatible with gensim 4.x.r>   r   r   r   r?   r   r\   rK   N)r@   r;   r   
isinstance	TypeErrortyper   r   r   r=   r>   r?   rJ   r   r\   recalc_char_ngram_bucketsrK   rU   r   s      rG   r   z#FastTextKeyedVectors._load_specials  s   8"D))8$I&III$ 455 	hY\`ae\f\ffgggt.// 	43G53P 	v   t233 	;o8V8V 	;'+AT':':':D$t344 	<GW9X9X 	<(,Qd(;(;(;D%t'-..2 	;'+AT':':':D$t(.//!3 	<(,Qd(;(;(;D%t^,, 	-D4E 	-**,,,tY'' 	"4< 	"!!!!!	" 	"rH   c                 .    | j         dk    r	|| j        v S dS )a  Check if `word` or any character ngrams in `word` are present in the vocabulary.
        A vector for the word is guaranteed to exist if current method returns True.

        Parameters
        ----------
        word : str
            Input word.

        Returns
        -------
        bool
            True if `word` or any character ngrams in `word` are present in the vocabulary, False otherwise.

        Note
        ----
        This method **always** returns True with char ngrams, because of the way FastText works.

        If you want to check if a word is an in-vocabulary term, use this instead:

        .. pycon:

            >>> from gensim.test.utils import datapath
            >>> from gensim.models import FastText
            >>> cap_path = datapath("crime-and-punishment.bin")
            >>> model = FastText.load_fasttext_format(cap_path, full_model=False)
            >>> 'steamtrain' in model.wv.key_to_index  # If False, is an OOV term
            False

        r   T)rE   r`   )rB   rk   s     rG   __contains__z!FastTextKeyedVectors.__contains__  s&    < ;! 	4,,,4rH   c                 H     t          t          |           j        |i | dS )zSave object.

        Parameters
        ----------
        fname : str
            Path to the output file.

        See Also
        --------
        :meth:`~gensim.models.fasttext.FastTextKeyedVectors.load`
            Load object.

        N)r@   r;   r   r   s      rG   r   zFastTextKeyedVectors.save&  s.     	/"D)).??????rH   c           	          t          |                              ddg          }t          t          |                               |||||||          S )zCArrange any special handling for the gensim.utils.SaveLoad protocolr\   rK   )setunionr@   r;   _save_specials)	rB   fname
separately	sep_limitignorepickle_protocolcompresssubnamerF   s	           rG   r   z#FastTextKeyedVectors._save_specials6  sX     V""NI#@AA)400??:y&/8WV V 	VrH   Fc                 :   || j         v r*t          t          |                               ||          S | j        dk    rt          d          t          j        | j        j	        d         t          j
                  }| j        }t          || j        | j        | j                  }t          |          dk    rt                              d|           |S |D ]}|||         z  }|r"|t          j                            |          z  S |t          |          z  S )a  Get `word` representations in vector space, as a 1D numpy array.

        Parameters
        ----------
        word : str
            Input word.
        norm : bool, optional
            If True, resulting vector will be L2-normalized (unit Euclidean length).

        Returns
        -------
        numpy.ndarray
            Vector representation of `word`.

        Raises
        ------
        KeyError
            If word and all its ngrams not in vocabulary.

        )normr   z3cannot calculate vector for OOV word without ngramsr   r   z=could not extract any ngrams from %r, returning origin vector)r`   r@   r;   
get_vectorrE   KeyErrorr^   r   r   r   r   ra   rC   rD   rJ   rd   r   linalgr   )rB   rk   r   word_vecngram_weightsngram_hashesnhrF   s          rG   r   zFastTextKeyedVectors.get_vector=  s   * 4$$ 	4-t44??4?PPP[A 	4PQQQx 3 9! <BJOOOH /M*4TZUUL<  A% 
  ^`deee" . .M"-- 4")..":":::#l"3"333rH   c                 T    t          t          |                               |          S )a  Get a single 1-D vector representation for a given `sentence`.
        This function is workalike of the official fasttext's get_sentence_vector().

        Parameters
        ----------
        sentence : list of (str or int)
            list of words specified by string or int ids.

        Returns
        -------
        numpy.ndarray
            1-D numpy array representation of the `sentence`.

        )r@   r;   get_mean_vector)rB   sentencerF   s     rG   get_sentence_vectorz(FastTextKeyedVectors.get_sentence_vectorl  s$     )400@@JJJrH   c                 T   t          | j                  | j        f}t          || j        |          | _        | j        | j        f}t          || j        |dz             | _        |                                  d| _        | 	                                 | 
                                 dS )zRMake underlying vectors match 'index_to_key' size; random-initialize any new rows.)prior_vectorsr0   r   N)rJ   index_to_keyr    r   r   rE   r   allocate_vecattrsnormsr   rU   )rB   r0   vocab_shapengrams_shapes       rG   resize_vectorsz#FastTextKeyedVectors.resize_vectors}  s     4,--t/?@)+TEW^bcccT%56*<tGZaehiaijjj   
&&(((rH   c                    t          |           }|j        d         || j        z   k    s
J d            |j        d         | j        k    s
J d            t	          j        |d|ddf                   | _        t	          j        ||dddf                   | _        |                                  | 	                                 dS )a  Perform initialization after loading a native Facebook model.

        Expects that the vocabulary (self.key_to_index) has already been initialized.

        Parameters
        ----------
        fb_vectors : np.array
            A matrix containing vectors for all the entities, including words
            and ngrams.  This comes directly from the binary model.
            The order of the vectors must correspond to the indices in
            the vocabulary.

        r   zunexpected number of vectorsr   z unexpected vector dimensionalityN)
rJ   r   rE   r    r^   arrayr   r   r   rU   )rB   
fb_vectorsvocab_wordss      rG   r   z#FastTextKeyedVectors.init_post_load  s     $ii"kDK&??__A____"d&66ZZ8ZZZZ  Xj+qqq&ABB hz+,,/'BCC&&(((rH   c                 f   | j         dk    r| j        | _        dS | j        dd                                         | _        t	          | j                  D ]\\  }}| j        |         }|D ]"}| j        |xx         | j        |         z  cc<   #| j        |xx         t          |          dz   z  cc<   ]dS )zAdjust the vectors for words in the vocabulary.

        The adjustment composes the trained full-word-token vectors with
        the vectors of the subword ngrams, matching the Facebook reference
        implementation behavior.

        r   Nr   )	rE   r   rK   copy	enumerater  r\   r   rJ   )rB   i_ngram_bucketsr  s        rG   rU   z#FastTextKeyedVectors.adjust_vectors  s     ;! 	-DLF)!!!,1133d/00 	6 	6DAq -a0M# ; ;Q4#6r#::LOOOs=11A55OOOO		6 	6rH   c           	         | j         dk    r=t          j        g t          j                  gt	          | j                  z  | _        dS dgt	          | j                  z  | _        t          | j                  D ]L\  }}t          j        t          || j	        | j
        | j                   t          j                  | j        |<   MdS )z|
        Scan the vocabulary, calculate ngrams and their hashes, and cache the list of ngrams for each known word.

        r   r   N)rE   r^   r  uint32rJ   r  r\   r  ra   rC   rD   )rB   r  rk   s      rG   r   z.FastTextKeyedVectors.recalc_char_ngram_buckets  s     ;! 	!#"BI!>!>!> ?#dFWBXBX XDF!FS):%;%;; !233 	 	GAt#%8dj$*dkJJi$ $ $Da  	 	rH   r   )r   )r   r   r   r=   rA   r   r7   r   r   r   r   r   r  r  r   rU   r   r   r   s   @rG   r;   r;     sP       @A 5$ 5$ 5$ 5$ 5$ 5$n P P P P [P*" " " " "0! ! !F@ @ @ @ @ V V V V V-4 -4 -4 -4 -4 -4^K K K K K"     86 6 6&      rH   r;   c                     | j         \  }}d|z  d|z  }}|                    ||||f                              t                    }t	          | |g          S )z<Pad a matrix with additional rows filled with random values.g      g      ?)r   uniformastyper=   r   )r   new_rowsrandr  columnslowhighsuffixs           rG   _pad_randomr&    sX    JAwwgC\\#th%899@@FFF1f+rH   c                   
 | j         ^
}
|k    r| S |
k    sJ |6t          j        }|                    |           t	          | |
z
  |          } nt          j        | |g|
z
  z  g          } 
fd|                                D             }|                    
fd|                                D                        |                                D ]\  }}	||	k    sJ | |	|g         | ||	g<   | S )a  Restore the array to its natural shape, undoing the optimization.

    A packed matrix contains contiguous vectors for ngrams, as well as a hashmap.
    The hash map maps the ngram hash to its index in the packed matrix.
    To unpack the matrix, we need to do several things:

    1. Restore the matrix to its "natural" shape, where the number of rows
       equals the number of buckets.
    2. Rearrange the existing rows such that the hashmap becomes the identity
       function and is thus redundant.
    3. Fill the new rows with random values.

    Parameters
    ----------

    m : np.ndarray
        The matrix to restore.
    num_rows : int
        The number of rows that this array should have.
    hash2index : dict
        the product of the optimization we are undoing.
    seed : float, optional
        The seed for the PRNG.  Will be used to initialize new rows.
    fill : float or array or None, optional
        Value for new rows. If None (the default), randomly initialize.
    Returns
    -------
    np.array
        The unpacked matrix.

    Notes
    -----

    The unpacked matrix will reference some rows in the input matrix to save memory.
    Throw away the old matrix after calling this function, or use np.copy.

    Nc                 <    i | ]\  }}||cxk     rk     n n||S r   r   .0hr  	orig_rowss      rG   
<dictcomp>z_unpack.<locals>.<dictcomp>  s@    GGGVaQGGGGYGGGGGAqGGGrH   c                 (    i | ]\  }}|k    ||S r   r   r)  s      rG   r-  z_unpack.<locals>.<dictcomp>  s(    III&1a!y.IAIIIrH   )r   r^   randomr0   r&  concatenater   r   )r   num_rows
hash2indexr0   fill	more_dimsrand_objswapr+  r  r,  s             @rG   _unpackr7    s2   L GI	H  i 
A9d 8i/::NAvI)=>?@@ HGGGz//11GGGDKKIIIIJ$4$4$6$6IIIJJJ

  1Avq!fI1a&		HrH         c                 (    | t           z  t          k    S ro   )_MB_MASK	_MB_START)bs    rG   _is_utf8_continuer>  *  s    x<9$$rH   c                 F    t          | ||          }fd|D             }|S )az  Calculate the ngrams of the word and hash them.

    Parameters
    ----------
    word : str
        The word to calculate ngram hashes for.
    minn : int
        Minimum ngram length
    maxn : int
        Maximum ngram length
    num_buckets : int
        The number of buckets

    Returns
    -------
        A list of hashes (integers), one per each detected ngram.

    c                 4    g | ]}t          |          z  S r   )r   )r*  nri   s     rG   
<listcomp>z#ft_ngram_hashes.<locals>.<listcomp>B  s&    EEEmA,EEErH   )r   )rk   r   r   ri   encoded_ngramsrl   s      `  rG   ra   ra   .  s4    & *$d;;NEEEEnEEEFMrH   )keyedvectors)r   )r   T)r   r   r   )r   N)3r   loggingnumpyr^   r   r   r   r=   gensim.models._fasttext_binr   gensim.models.word2vecr   gensim.models.keyedvectorsr   r   r	   gensim.utilsr
   gensim.models.fasttext_innerr   r   r   r   r   !gensim.models.fasttext_corpusfiler   r   ImportError	NO_CYTHON	getLoggerr   rd   r   SaveLoadr   r   r   r   r   r   r   r   r;   r&  r7  r;  r<  r>  ra   gensim.modelsrD  r   rH   rG   <module>rR     s  l l\      / / / / / / / / / / " " " " + + + + + + A A A A A A A A       # # # # # #
              SRRRRRRRR   
/ 
	8	$	$u u u u ux u u up# # # # #EN # # #
B B B B B B B B  >K >K >K >KB6 6 6 6rA A A AH
 
 
@"T "T "T "TJm m m m m< m m m`	  J J J Jb 	% % %  2 ' & & & & &$8 ! ! !s   A A