Ruby - Scraper concatenate strings

Hello, I'm making a ruby web scraper to gather some info. In the HTML of the page that i want to scrape there's 3 equal spans per article: <article> ... <span class="item-detail"> foo <small></small> </span> <span class="item-detail"> bar <small>bar</small> </span> <span class="item-detail"> <small> foo bar</small> </span> ... </article> Some of the articles don't have the last span. For now i only have: page.css('span.item-detail').each do |line| line.text end I'm using Nokogiri and open-uri gems. What i want is to concatenate the 3 spans (some articles only have two spans in the "item-detail" class) and print them in the screen, can anyone help me please?

Posted over 3 years ago by João Belo
Posted over 3 years ago by João Belo

Sorry about the format of the post, i will post it better:
I'm making a Ruby web scraper to gather some info. In the HTML of the page that I want to scrape, there are 3 equal spans per article:

<article>
...
<span class="item-detail">
foo
<small></small>
</span>

<span class="item-detail">
bar
<small>bar</small>
</span>

<span class="item-detail">
<small> foo bar</small>
</span>
...
</article>
However, some of the articles don't have the last span.

For now, I have been using this code:

#first loop to find the title
page.css('a.item-link').each do |line|
puts line.text
end
#Second loop to find the price
page.css('span.item-price').each do |line|
puts line.text
end
#third loop to find the details
page.css('span.item-detail').each do |line|
line.text
end
I'm using the Nokogiri gem and open-uri to retrieve and parse the file.

How can I concatenate the 3 spans (some articles only have two spans in the "item-detail" class) and print them in the screen?

My desired output is:

title 1
title 2
title 3
price 1
price 2
price 3
details 1 (first span)
details 2 (first span)
details 3 (first span)
details 1 (second span)
details 2 (second span)
details 3 (second span)
details 1 (third span)
details 2 (third span)
details 3 (third span)
Some of the articles don't have the third span so if that is the case i will print "". My goal is to write the results to a .csv file

1
Posted over 3 years ago by Alex Yang

I'm not sure exactly what you're trying to do. Are you trying to concatenate the HTML? If so, you should consider using arrays. The basic approach would be to initialize an empty array, and within the loop, append each span element to fill up the array. Here's the Ruby reference doc for arrays:
http://ruby-doc.org/core-2.2.0/Array.html

Hope that helps!

1