In this paper, we propose text style transfer (TST) and text-to-speech synthesis (TTS) using disfluency annotation for the application of “spontaneous speech synthesis using the written text.” TTS technology has progressed significantly, achieving human-like naturalness in reading-style speech generation. However, it is still developing when it comes to producing more spontaneous humanlike speech. Moreover, for existing spontaneous speech synthesizers, it is assumed that the input text contains spontaneous parts such as disfluencies. Therefore, we aim to synthesize spontaneous speech with disfluency on the basis of written materials without disfluent parts. Specifically, we train the TST and TTS systems for lecture speech generation by tagging disfluencies with special symbols or converting disfluencies into special symbols to enhance each model’s linguistic and acoustic control over disfluencies. We combine the TST and TTS systems using disfluency annotation to create a lecture speech generation system and demonstrate the effectiveness of our method by comparing the results of objective and subjective evaluation experiments with those obtained without disfluency annotation.