You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

246 lines
10 KiB

  1. // base/io-funcs.h
  2. // Copyright 2009-2011 Microsoft Corporation; Saarland University;
  3. // Jan Silovsky; Yanmin Qian
  4. // 2016 Xiaohui Zhang
  5. // See ../../COPYING for clarification regarding multiple authors
  6. //
  7. // Licensed under the Apache License, Version 2.0 (the "License");
  8. // you may not use this file except in compliance with the License.
  9. // You may obtain a copy of the License at
  10. // http://www.apache.org/licenses/LICENSE-2.0
  11. // THIS CODE IS PROVIDED *AS IS* BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  12. // KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED
  13. // WARRANTIES OR CONDITIONS OF TITLE, FITNESS FOR A PARTICULAR PURPOSE,
  14. // MERCHANTABLITY OR NON-INFRINGEMENT.
  15. // See the Apache 2 License for the specific language governing permissions and
  16. // limitations under the License.
  17. #ifndef KALDI_BASE_IO_FUNCS_H_
  18. #define KALDI_BASE_IO_FUNCS_H_
  19. // This header only contains some relatively low-level I/O functions.
  20. // The full Kaldi I/O declarations are in ../util/kaldi-io.h
  21. // and ../util/kaldi-table.h
  22. // They were put in util/ in order to avoid making the Matrix library
  23. // dependent on them.
  24. #include <cctype>
  25. #include <string>
  26. #include <utility>
  27. #include <vector>
  28. #include "base/io-funcs-inl.h"
  29. #include "base/kaldi-common.h"
  30. namespace kaldi {
  31. /*
  32. This comment describes the Kaldi approach to I/O. All objects can be written
  33. and read in two modes: binary and text. In addition we want to make the I/O
  34. work if we redefine the typedef "BaseFloat" between floats and doubles.
  35. We also want to have control over whitespace in text mode without affecting
  36. the meaning of the file, for pretty-printing purposes.
  37. Errors are handled by throwing a KaldiFatalError exception.
  38. For integer and floating-point types (and boolean values):
  39. WriteBasicType(std::ostream &, bool binary, const T&);
  40. ReadBasicType(std::istream &, bool binary, T*);
  41. and we expect these functions to be defined in such a way that they work when
  42. the type T changes between float and double, so you can read float into double
  43. and vice versa]. Note that for efficiency and space-saving reasons, the
  44. Vector and Matrix classes do not use these functions [but they preserve the
  45. type interchangeability in their own way]
  46. For a class (or struct) C:
  47. class C {
  48. ..
  49. Write(std::ostream &, bool binary, [possibly extra optional args for
  50. specific classes]) const; Read(std::istream &, bool binary, [possibly extra
  51. optional args for specific classes]);
  52. ..
  53. }
  54. NOTE: The only actual optional args we used are the "add" arguments in
  55. Vector/Matrix classes, which specify whether we should sum the data already
  56. in the class with the data being read.
  57. For types which are typedef's involving stl classes, I/O is as follows:
  58. typedef std::vector<std::pair<A, B> > MyTypedefName;
  59. The user should define something like:
  60. WriteMyTypedefName(std::ostream &, bool binary, const MyTypedefName &t);
  61. ReadMyTypedefName(std::ostream &, bool binary, MyTypedefName *t);
  62. The user would have to write these functions.
  63. For a type std::vector<T>:
  64. void WriteIntegerVector(std::ostream &os, bool binary, const std::vector<T>
  65. &v); void ReadIntegerVector(std::istream &is, bool binary, std::vector<T> *v);
  66. For other types, e.g. vectors of pairs, the user should create a routine of
  67. the type WriteMyTypedefName. This is to avoid introducing confusing templated
  68. functions; we could easily create templated functions to handle most of these
  69. cases but they would have to share the same name.
  70. It also often happens that the user needs to write/read special tokens as part
  71. of a file. These might be class headers, or separators/identifiers in the
  72. class. We provide special functions for manipulating these. These special
  73. tokens must be nonempty and must not contain any whitespace.
  74. void WriteToken(std::ostream &os, bool binary, const char*);
  75. void WriteToken(std::ostream &os, bool binary, const std::string & token);
  76. int Peek(std::istream &is, bool binary);
  77. void ReadToken(std::istream &is, bool binary, std::string *str);
  78. void PeekToken(std::istream &is, bool binary, std::string *str);
  79. WriteToken writes the token and one space (whether in binary or text mode).
  80. Peek returns the first character of the next token, by consuming whitespace
  81. (in text mode) and then returning the peek() character. It returns -1 at EOF;
  82. it doesn't throw. It's useful if a class can have various forms based on
  83. typedefs and virtual classes, and wants to know which version to read.
  84. ReadToken allows the caller to obtain the next token. PeekToken works just
  85. like ReadToken, but seeks back to the beginning of the token. A subsequent
  86. call to ReadToken will read the same token again. This is useful when
  87. different object types are written to the same file; using PeekToken one can
  88. decide which of the objects to read.
  89. There is currently no special functionality for writing/reading strings (where
  90. the strings contain data rather than "special tokens" that are whitespace-free
  91. and nonempty). This is because Kaldi is structured in such a way that strings
  92. don't appear, except as OpenFst symbol table entries (and these have their own
  93. format).
  94. NOTE: you should not call ReadIntegerType and WriteIntegerType with types,
  95. such as int and size_t, that are machine-independent -- at least not
  96. if you want your file formats to port between machines. Use int32 and
  97. int64 where necessary. There is no way to detect this using compile-time
  98. assertions because C++ only keeps track of the internal representation of
  99. the type.
  100. */
  101. /// \addtogroup io_funcs_basic
  102. /// @{
  103. /// WriteBasicType is the name of the write function for bool, integer types,
  104. /// and floating-point types. They all throw on error.
  105. template <class T>
  106. void WriteBasicType(std::ostream& os, bool binary, T t);
  107. /// ReadBasicType is the name of the read function for bool, integer types,
  108. /// and floating-point types. They all throw on error.
  109. template <class T>
  110. void ReadBasicType(std::istream& is, bool binary, T* t);
  111. // Declare specialization for bool.
  112. template <>
  113. void WriteBasicType<bool>(std::ostream& os, bool binary, bool b);
  114. template <>
  115. void ReadBasicType<bool>(std::istream& is, bool binary, bool* b);
  116. // Declare specializations for float and double.
  117. template <>
  118. void WriteBasicType<float>(std::ostream& os, bool binary, float f);
  119. template <>
  120. void WriteBasicType<double>(std::ostream& os, bool binary, double f);
  121. template <>
  122. void ReadBasicType<float>(std::istream& is, bool binary, float* f);
  123. template <>
  124. void ReadBasicType<double>(std::istream& is, bool binary, double* f);
  125. // Define ReadBasicType that accepts an "add" parameter to add to
  126. // the destination. Caution: if used in Read functions, be careful
  127. // to initialize the parameters concerned to zero in the default
  128. // constructor.
  129. template <class T>
  130. inline void ReadBasicType(std::istream& is, bool binary, T* t, bool add) {
  131. if (!add) {
  132. ReadBasicType(is, binary, t);
  133. } else {
  134. T tmp = T(0);
  135. ReadBasicType(is, binary, &tmp);
  136. *t += tmp;
  137. }
  138. }
  139. /// Function for writing STL vectors of integer types.
  140. template <class T>
  141. inline void WriteIntegerVector(std::ostream& os, bool binary,
  142. const std::vector<T>& v);
  143. /// Function for reading STL vector of integer types.
  144. template <class T>
  145. inline void ReadIntegerVector(std::istream& is, bool binary, std::vector<T>* v);
  146. /// Function for writing STL vectors of pairs of integer types.
  147. template <class T>
  148. inline void WriteIntegerPairVector(std::ostream& os, bool binary,
  149. const std::vector<std::pair<T, T> >& v);
  150. /// Function for reading STL vector of pairs of integer types.
  151. template <class T>
  152. inline void ReadIntegerPairVector(std::istream& is, bool binary,
  153. std::vector<std::pair<T, T> >* v);
  154. /// The WriteToken functions are for writing nonempty sequences of non-space
  155. /// characters. They are not for general strings.
  156. void WriteToken(std::ostream& os, bool binary, const char* token);
  157. void WriteToken(std::ostream& os, bool binary, const std::string& token);
  158. /// Peek consumes whitespace (if binary == false) and then returns the peek()
  159. /// value of the stream.
  160. int Peek(std::istream& is, bool binary);
  161. /// ReadToken gets the next token and puts it in str (exception on failure). If
  162. /// PeekToken() had been previously called, it is possible that the stream had
  163. /// failed to unget the starting '<' character. In this case ReadToken() returns
  164. /// the token string without the leading '<'. You must be prepared to handle
  165. /// this case. ExpectToken() handles this internally, and is not affected.
  166. void ReadToken(std::istream& is, bool binary, std::string* token);
  167. /// PeekToken will return the first character of the next token, or -1 if end of
  168. /// file. It's the same as Peek(), except if the first character is '<' it will
  169. /// skip over it and will return the next character. It will attempt to unget
  170. /// the '<' so the stream is where it was before you did PeekToken(), however,
  171. /// this is not guaranteed (see ReadToken()).
  172. int PeekToken(std::istream& is, bool binary);
  173. /// ExpectToken tries to read in the given token, and throws an exception
  174. /// on failure.
  175. void ExpectToken(std::istream& is, bool binary, const char* token);
  176. void ExpectToken(std::istream& is, bool binary, const std::string& token);
  177. /// ExpectPretty attempts to read the text in "token", but only in non-binary
  178. /// mode. Throws exception on failure. It expects an exact match except that
  179. /// arbitrary whitespace matches arbitrary whitespace.
  180. void ExpectPretty(std::istream& is, bool binary, const char* token);
  181. void ExpectPretty(std::istream& is, bool binary, const std::string& token);
  182. /// @} end "addtogroup io_funcs_basic"
  183. /// InitKaldiOutputStream initializes an opened stream for writing by writing an
  184. /// optional binary header and modifying the floating-point precision; it will
  185. /// typically not be called by users directly.
  186. inline void InitKaldiOutputStream(std::ostream& os, bool binary);
  187. /// InitKaldiInputStream initializes an opened stream for reading by detecting
  188. /// the binary header and setting the "binary" value appropriately;
  189. /// It will typically not be called by users directly.
  190. inline bool InitKaldiInputStream(std::istream& is, bool* binary);
  191. } // end namespace kaldi.
  192. #endif // KALDI_BASE_IO_FUNCS_H_