You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

120 lines
5.8 KiB

  1. // itf/decodable-itf.h
  2. // Copyright 2009-2011 Microsoft Corporation; Saarland University;
  3. // Mirko Hannemann; Go Vivace Inc.;
  4. // 2013 Johns Hopkins University (author: Daniel Povey)
  5. // See ../../COPYING for clarification regarding multiple authors
  6. //
  7. // Licensed under the Apache License, Version 2.0 (the "License");
  8. // you may not use this file except in compliance with the License.
  9. // You may obtain a copy of the License at
  10. //
  11. // http://www.apache.org/licenses/LICENSE-2.0
  12. //
  13. // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  14. // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
  15. // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
  16. // MERCHANTABLITY OR NON-INFRINGEMENT.
  17. // See the Apache 2 License for the specific language governing permissions and
  18. // limitations under the License.
  19. #ifndef KALDI_ITF_DECODABLE_ITF_H_
  20. #define KALDI_ITF_DECODABLE_ITF_H_ 1
  21. #include "base/kaldi-common.h"
  22. namespace kaldi {
  23. /// @ingroup Interfaces
  24. /// @{
  25. /**
  26. DecodableInterface provides a link between the (acoustic-modeling and
  27. feature-processing) code and the decoder. The idea is to make this
  28. interface as small as possible, and to make it as agnostic as possible about
  29. the form of the acoustic model (e.g. don't assume the probabilities are a
  30. function of just a vector of floats), and about the decoder (e.g. don't
  31. assume it accesses frames in strict left-to-right order). For normal
  32. models, without on-line operation, the "decodable" sub-class will just be a
  33. wrapper around a matrix of features and an acoustic model, and it will
  34. answer the question 'what is the acoustic likelihood for this index and this
  35. frame?'.
  36. For online decoding, where the features are coming in in real time, it is
  37. important to understand the IsLastFrame() and NumFramesReady() functions.
  38. There are two ways these are used: the old online-decoding code, in
  39. ../online/, and the new online-decoding code, in ../online2/. In the old
  40. online-decoding code, the decoder would do: \code{.cc} for (int frame = 0;
  41. !decodable.IsLastFrame(frame); frame++) {
  42. // Process this frame
  43. }
  44. \endcode
  45. and the call to IsLastFrame would block if the features had not arrived yet.
  46. The decodable object would have to know when to terminate the decoding. This
  47. online-decoding mode is still supported, it is what happens when you call,
  48. for example, LatticeFasterDecoder::Decode().
  49. We realized that this "blocking" mode of decoding is not very convenient
  50. because it forces the program to be multi-threaded and makes it complex to
  51. control endpointing. In the "new" decoding code, you don't call (for
  52. example) LatticeFasterDecoder::Decode(), you call
  53. LatticeFasterDecoder::InitDecoding(), and then each time you get more
  54. features, you provide them to the decodable object, and you call
  55. LatticeFasterDecoder::AdvanceDecoding(), which does something like this:
  56. \code{.cc}
  57. while (num_frames_decoded_ < decodable.NumFramesReady()) {
  58. // Decode one more frame [increments num_frames_decoded_]
  59. }
  60. \endcode
  61. So the decodable object never has IsLastFrame() called. For decoding where
  62. you are starting with a matrix of features, the NumFramesReady() function
  63. will always just return the number of frames in the file, and IsLastFrame()
  64. will return true for the last frame.
  65. For truly online decoding, the "old" online decodable objects in ../online/
  66. have a "blocking" IsLastFrame() and will crash if you call NumFramesReady().
  67. The "new" online decodable objects in ../online2/ return the number of frames
  68. currently accessible if you call NumFramesReady(). You will likely not need
  69. to call IsLastFrame(), but we implement it to only return true for the last
  70. frame of the file once we've decided to terminate decoding.
  71. */
  72. class DecodableInterface {
  73. public:
  74. /// Returns the log likelihood, which will be negated in the decoder.
  75. /// The "frame" starts from zero. You should verify that
  76. /// NumFramesReady() > frame before calling this.
  77. virtual BaseFloat LogLikelihood(int32 frame, int32 index) = 0;
  78. /// Returns true if this is the last frame. Frames are zero-based, so the
  79. /// first frame is zero. IsLastFrame(-1) will return false, unless the file
  80. /// is empty (which is a case that I'm not sure all the code will handle, so
  81. /// be careful). Caution: the behavior of this function in an online setting
  82. /// is being changed somewhat. In future it may return false in cases where
  83. /// we haven't yet decided to terminate decoding, but later true if we decide
  84. /// to terminate decoding. The plan in future is to rely more on
  85. /// NumFramesReady(), and in future, IsLastFrame() would always return false
  86. /// in an online-decoding setting, and would only return true in a
  87. /// decoding-from-matrix setting where we want to allow the last delta or LDA
  88. /// features to be flushed out for compatibility with the baseline setup.
  89. virtual bool IsLastFrame(int32 frame) const = 0;
  90. /// The call NumFramesReady() will return the number of frames currently
  91. /// available for this decodable object. This is for use in setups where
  92. /// you don't want the decoder to block while waiting for input. This is
  93. /// newly added as of Jan 2014, and I hope, going forward, to rely on this
  94. /// mechanism more than IsLastFrame to know when to stop decoding.
  95. virtual int32 NumFramesReady() const {
  96. KALDI_ERR << "NumFramesReady() not implemented for this decodable type.";
  97. return -1;
  98. }
  99. /// Returns the number of states in the acoustic model
  100. /// (they will be indexed one-based, i.e. from 1 to NumIndices();
  101. /// this is for compatibility with OpenFst).
  102. virtual int32 NumIndices() const = 0;
  103. virtual ~DecodableInterface() {}
  104. };
  105. /// @}
  106. } // namespace kaldi
  107. #endif // KALDI_ITF_DECODABLE_ITF_H_