You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

116 lines
5.0 KiB

  1. // fstext/determinize-star.h
  2. // Copyright 2009-2011 Microsoft Corporation
  3. // 2014 Guoguo Chen
  4. // 2015 Hainan Xu
  5. // See ../../COPYING for clarification regarding multiple authors
  6. //
  7. // Licensed under the Apache License, Version 2.0 (the "License");
  8. // you may not use this file except in compliance with the License.
  9. // You may obtain a copy of the License at
  10. //
  11. // http://www.apache.org/licenses/LICENSE-2.0
  12. //
  13. // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  14. // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
  15. // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
  16. // MERCHANTABLITY OR NON-INFRINGEMENT.
  17. // See the Apache 2 License for the specific language governing permissions and
  18. // limitations under the License.
  19. #ifndef KALDI_FSTEXT_DETERMINIZE_STAR_H_
  20. #define KALDI_FSTEXT_DETERMINIZE_STAR_H_
  21. #include <fst/fst-decl.h>
  22. #include <fst/fstlib.h>
  23. #include <algorithm>
  24. #include <map>
  25. #include <set>
  26. #include <stdexcept> // this algorithm uses exceptions
  27. #include <vector>
  28. namespace fst {
  29. /// \addtogroup fst_extensions
  30. /// @{
  31. // For example of usage, see test-determinize-star.cc
  32. /*
  33. DeterminizeStar implements determinization with epsilon removal, which we
  34. distinguish with a star.
  35. We define a determinized* FST as one in which no state has more than one
  36. transition with the same input-label. Epsilon input labels are not allowed
  37. except starting from states that have exactly one arc exiting them (and are
  38. not final). [In the normal definition of determinized, epsilon-input labels
  39. are not allowed at all, whereas in Mohri's definition, epsilons are treated
  40. as ordinary symbols]. The determinized* definition is intended to simulate
  41. the effect of allowing strings of output symbols at each state.
  42. The algorithm implemented here takes an Fst<Arc>, and a pointer to a
  43. MutableFst<Arc> where it puts its output. The weight type is assumed to be a
  44. float-weight. It does epsilon removal and determinization.
  45. This algorithm may fail if the input has epsilon cycles under
  46. certain circumstances (i.e. the semiring is non-idempotent, e.g. the log
  47. semiring, or there are negative cost epsilon cycles).
  48. This implementation is much less fancy than the one in fst/determinize.h, and
  49. does not have an "on-demand" version.
  50. The algorithm is a fairly normal determinization algorithm. We keep in
  51. memory the subsets of states, together with their leftover strings and their
  52. weights. The only difference is we detect input epsilon transitions and
  53. treat them "specially".
  54. */
  55. // This algorithm will be slightly faster if you sort the input fst on input
  56. // label.
  57. /**
  58. This function implements the normal version of DeterminizeStar, in which the
  59. output strings are represented using sequences of arcs, where all but the
  60. first one has an epsilon on the input side. The debug_ptr argument is an
  61. optional pointer to a bool that, if it becomes true while the algorithm is
  62. executing, the algorithm will print a traceback and terminate (used in
  63. fstdeterminizestar.cc debug non-terminating determinization).
  64. If max_states is positive, it will stop determinization and throw an
  65. exception as soon as the max-states is reached. This can be useful in test.
  66. If allow_partial is true, the algorithm will output partial results when the
  67. specified max_states is reached (when larger than zero), instead of throwing
  68. out an error.
  69. Caution, the return status is un-intuitive: this function will return false
  70. if determinization completed normally, and true if it was stopped early by
  71. reaching the 'max-states' limit, and a partial FST was generated.
  72. */
  73. template <class F>
  74. bool DeterminizeStar(F& ifst, MutableFst<typename F::Arc>* ofst, // NOLINT
  75. float delta = kDelta, bool* debug_ptr = NULL,
  76. int max_states = -1, bool allow_partial = false);
  77. /* This is a version of DeterminizeStar with a slightly more "natural" output
  78. format, where the output sequences are encoded using the GallicArc (i.e. the
  79. output symbols are strings. If max_states is positive, it will stop
  80. determinization and throw an exception as soon as the max-states is reached.
  81. This can be useful in test. If allow_partial is true, the algorithm will
  82. output partial results when the specified max_states is reached (when larger
  83. than zero), instead of throwing out an error.
  84. Caution, the return status is un-intuitive: this function will return false
  85. if determinization completed normally, and true if it was stopped early by
  86. reaching the 'max-states' limit, and a partial FST was generated.
  87. */
  88. template <class F>
  89. bool DeterminizeStar(F& ifst, // NOLINT
  90. MutableFst<GallicArc<typename F::Arc> >* ofst,
  91. float delta = kDelta, bool* debug_ptr = NULL,
  92. int max_states = -1, bool allow_partial = false);
  93. /// @} end "addtogroup fst_extensions"
  94. } // end namespace fst
  95. #include "fstext/determinize-star-inl.h"
  96. #endif // KALDI_FSTEXT_DETERMINIZE_STAR_H_