Is this format going to be consistent? If so, you can simply query nextSibling twice for the strong element's parent (p).
If it's going to vary, you might need to manually check when to stop iterating through the siblings, such as verifying if the sibling contains a strong element.
It all depends on the full context.
Here's example with basic loops. You may want to add more checks or better queries given a different situation.
Document doc = Jsoup.connect(url).get();
List<Elements> data = new ArrayList<>();
Elements chapters = doc.select("p > strong");
for (Element chapter : chapters) {
if (!chapter.ownText().toLowerCase().contains("chapter"))
continue; //we've reached a strong element that isn't actually a chapter
List<Element> siblings = new ArrayList<>();
Element next = chapter.nextElementSibling();
while (next != null) {
if (next.ownText().toLowerCase().contains("chapter"))
break; //we've reached the end of this chapter
siblings.add(next);
next = next.nextElementSibling();
}
data.add(new Elements(siblings));
}
Share
Post a Comment
for "How To Get Contents Between Two Tags In Jsoup/javascript"
Post a Comment for "How To Get Contents Between Two Tags In Jsoup/javascript"